K8sGPT Tutorial - When Kubernetes Meets AI

In this blog we’ll explore k8sGPT, a powerful tool that brings the capabilities of AI to change the way you manage Kubernetes.

Kunal Verma·July 31, 2024·16 min read

Cover image for K8sGPT Tutorial - When Kubernetes Meets AI

On this page (4)

Broken Pod
Broken Service
Custom Analyzers
Slack Integration

We all agree that in recent years, AI has taken the world by storm. With tools like ChatGPT and platforms developed by OpenAI leading the charge — AI is being used in many industries to make work easier, provide helpful insights, and boost productivity. According to The State of AI 2023 by McKinsey, AI adoption has increased by over 60% in the past year alone, with businesses using AI to stay ahead and innovate!

Given how AI is making a significant impact across industries, it makes sense to ask — why not use AI in managing Kubernetes?

Kubernetes is one of the most widely adopted open-source container orchestration platforms. Its powerful features make it a top choice for automating the deployment, scaling, and management of containerized applications. However, as Kubernetes environments grow larger, they can become complex and difficult to manage. Troubleshooting issues at scale can be particularly challenging.

This is where AI can make a big difference! By adding AI to the Kubernetes workflow, we can make the management process smarter and more efficient. AI can help diagnose and resolve issues quickly, automate routine tasks, and provide insights that enhance decision-making.

Building upon this narrative, in this blog we’ll explore k8sGPT — a powerful tool that brings the capabilities of AI to change the way you manage Kubernetes, making it easier to solve problems, make decisions, and keep things running smoothly.

Introducing K8sGPT - Kubernetes Troubleshooting with AI #

According to the website:

K8sGPT is a tool for scanning your kubernetes clusters, diagnosing and triaging issues in simple english. It has SRE experience codified into its analyzers and helps to pull out the most relevant information to enrich it with AI.

K8sGPT is a CNCF sandbox project designed to simplify Kubernetes management using AI and natural language processing. It integrates with various AI backends, such as OpenAI, Azure OpenAI, and Google Gemini, to provide clear and actionable insights into your Kubernetes environment. These insights are presented in a user-friendly format, making them easy to understand and act upon.

Some of its key features include:

Cluster Scanning - Automatically checks your Kubernetes clusters to find any issues.
Issue Diagnosis - Quickly identifies problems and explains them in simple language.
Actionable Advice - Gives practical tips on how to fix issues.
Anonymization - Protects sensitive data by hiding it during analysis.
Extensibility - Allows you to add custom analyzers to meet your specific needs.

These features make K8sGPT a valuable tool for anyone managing Kubernetes environments. It helps you find and fix problems faster, automate routine tasks, and make better decisions. In the next sections, we'll see how these features work through practical demos and real-world examples.

Prerequisites #

Before we begin exploring K8sGPT, here are a few things you’ll need:

kubectl installed
A Kubernetes cluster (we’ll be using minikube for this tutorial, but feel free to choose any other tool)
helm installed
ollama installed - we’ll be using this as the AI backend for K8sGPT (more on this later)

Installation #

There are two main ways you can use K8sGPT to analyze your Kubernetes cluster:

Using the CLI
Using it as an In-cluster Operator

In the upcoming sections where we discuss different functionalities of K8sGPT, we’ll mainly focus on using the CLI.

Based on your operating system, there various methods of installing the k8sgpt CLI which you can check out in the installation guide. We will be using the following commands to install it via homebrew:

brew tap k8sgpt-ai/k8sgpt
brew install k8sgpt

Once the installation is complete, use the following command to verify whether it was installed correctly:

k8sgpt version

# Output
k8sgpt: 0.3.39 (Homebrew), built at: ...

Authentication #

Before we move forward with using K8sGPT to analyse our cluster, we need to authenticate it with an AI backend. A Backend (also called Provider) is a service that provides access to the AI language model. K8sGPT supports a lot of different AI backends — so there are several options to choose from!

Tip: Each AI backend has its own strengths and weaknesses, so it is important to choose the one that is right for your needs.

To know all the supported AI backends, we can use the k8sgpt auth command as shown below:

k8sgpt auth list

# Output
Default: 
> openai
Active: 
Unused: 
> openai
> localai
> ollama
> azureopenai
> cohere
> amazonbedrock
> amazonsagemaker
> google
> noopai
> huggingface
> googlevertexai
> oci
> watsonxai

For this tutorial, we’ll be using Ollama to run the Llama 3 (latest as of today) LLM locally on our machine — at zero cost!

Note: OpenAI is the default backend for K8sGPT and is recommended by the community for its powerful language models and accurate results. However, for local testing and demo purposes, free, open-source options are also supported, such as Ollama, Local AI, and FakeAI. Remember, better language models lead to more accurate results!

To get things started, make the sure the ollama server is up and running using the following command:

ollama serve

To authenticate K8sGPT with ollama, we’ll be using the k8sgpt auth command as shown below:

k8sgpt auth add --backend ollama --model llama3 --baseurl http://localhost:11434

# Output
ollama added to the AI backend provider list

Here, we provide the base URL address as http://localhost:11434, which is the default address of the Ollama server.

To verify the authorization, we can use the following command to check the status of the ollama backend:

k8sgpt auth list

# Output
...
Active: 
> ollama
Unused: 
> openai
> localai
> azureopenai
> cohere
...

Note: To see the full list of different k8sgpt CLI commands, refer to the documentation.

We have successfully authorized K8sGPT to use Ollama as an AI backend 🎉

Basic Analysis #

Let us explore the core functionality of K8sGPT, which is scanning and debugging issues in a Kubernetes cluster.

Here, we’ll be taking two very simple scenarios to demonstrate its capabilities:

Broken Pod #

Let us run a “potentially” malicious pod into our Kubernetes cluster. You can find the YAML manifest below:

apiVersion: v1
kind: Pod
metadata:
  name: hungry-pod
  namespace: default
spec:
  containers:
  - name: busybox
    image: busybox
    resources:
      requests:
        cpu: "1000"
    command: ["sh", "-c", "echo Hello Kubernetes! && sleep 3600"]

Use the following command to apply this to our cluster:

kubectl apply -f broken-pod.yaml

You will notice that the pod is currently in a Pending state and remains that way — which means there’s something wrong here!

![](/img/blog/k8sgpt-tutorial-when-kubernetes-meets-ai/3606ae70-0335-4741-acbb-053486142754.png align="center")

If you’re familiar with the basic concepts of Kubernetes, your default approach will be to check the pod events using the following command:

kubectl describe pod hungry-pod

But, lets see how far can we go with using AI to know more! Run the following command to to scan the entire cluster and find issues:

k8sgpt analyse

![](/img/blog/k8sgpt-tutorial-when-kubernetes-meets-ai/661376ef-1465-4ebf-ba40-6c61fcd2bf17.png align="center")

According to this, there are no nodes with sufficient CPU resources available to schedule the incoming pod. That’s helpful — let’s take this a step further. Run the following command to get additional information about the error and get recommendations by AI on how to fix the issues:

k8sgpt analyse --explain --backend ollama

![](/img/blog/k8sgpt-tutorial-when-kubernetes-meets-ai/dfee3bd6-e61d-4b3f-a4da-213ba6f9d47f.png align="center")

Here are two things the AI provides for each error:

Explaining the error in simple language
Giving potential solutions to debug and fix the issue

We can try all these solutions to solve our pod issue, but to me, the simplest solution is the 3rd one — adjusting the pod’s CPU requests.

Let us first check the CPU capacity of our node by using the following command:

kubectl describe node minikube

# Output
...
Capacity:
  cpu:                11
  ephemeral-storage:  61202244Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  hugepages-32Mi:     0
  hugepages-64Ki:     0
  memory:             8029108Ki
  pods:               110
...

Edit the pod manifest and adjust the CPU requests value within the node’s limit (i.e., maximum 11 cores). Here’s the updated YAML manifest:

apiVersion: v1
kind: Pod
metadata:
  name: hungry-pod
  namespace: default
spec:
  containers:
  - name: busybox
    image: busybox
    resources:
      requests:
        cpu: "5"
    command: ["sh", "-c", "echo Hello Kubernetes! && sleep 3600"]

To fix the issue, we’ll first need to delete the existing pod and then apply the new manifest to create a fresh pod:

![](/img/blog/k8sgpt-tutorial-when-kubernetes-meets-ai/787da15c-5c96-4786-aef9-a79feb33759f.png align="center")

Voilà, our pod is up and running successfully 🎉

Broken Service #

Let us have a look into another scenario and see how K8sGPT helps us here. Apply the YAML manifest below that creates a new service and a pod:

apiVersion: v1
kind: Service
metadata:
  name: pod-svc
  namespace: default
spec:
  selector:
    app: ngnx
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  namespace: default
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80

Once applied, you’ll notice that there no errors. The pod is in running state and the service has been created:

![](/img/blog/k8sgpt-tutorial-when-kubernetes-meets-ai/ad2ccda5-768c-4a0f-9bf7-4b5352adcafe.png align="center")

But sometimes looks can be deceiving! Let us use K8sGPT to scan for issues that may not be visible and provide potential solutions to fix them:

k8sgpt analyse --explain -b ollama

![](/img/blog/k8sgpt-tutorial-when-kubernetes-meets-ai/0100e745-73be-4b87-981d-1633c77845dc.png align="center")

Caught it! We have an issue with the service - there are no available pods that match the service's selector criteria (labels in this case) to route traffic to.

To fix this, either we can change the service’s labels or the pod’s label. Use the following command to quickly change the pod labels to match the service:

kubectl label pod nginx-pod app=ngnx --overwrite

# Output
pod/nginx-pod labeled

Now, if we run the k8sgpt analyse command again, it won’t detect any errors with our pod and service — because we’ve solved it 🎉

![](/img/blog/k8sgpt-tutorial-when-kubernetes-meets-ai/39076c0e-5b68-4747-8dc6-a063bb66acff.png align="center")

Using filters and flags #

In the previous section, you might have noticed that the k8sgpt analyse command gives a list of all the issues within our cluster, covering all Kubernetes resources. In a demo scenario (similar to what we have here), we may only have a few Kubernetes resources deployed in our cluster — making it relatively easier to navigate our target resources.

But let’s get real here! In a real-world production scenario, you may have 1000s of resources deployed, and it may get difficult to navigate and find them following this format, right?

Interestingly, K8sGPT does this for you using filters. Filters are a way of selecting which resources you wish to be part of the default analysis.

To check the list of available filters, use the following command:

k8sgpt filters list

# Output
Active: 
> MutatingWebhookConfiguration
> Node
> HorizontalPodAutoScaler
> Deployment
...
Unused: 
> Gateway
> HTTPRoute
> PodDisruptionBudget
> GatewayClass

Note: These filters corresponds specific analyzers written in the K8sGPT codebase. You can explore the codebase to learn more.

Here are a few ways you can use filters with the k8sgpt analyse command to select specific resources:

Filter by resource (example: analyse all Pods in the cluster)
```
k8sgpt analyse --explain -b ollama --filter Pod
```
![](/img/blog/k8sgpt-tutorial-when-kubernetes-meets-ai/5064acf5-659e-460f-a538-43433366123e.png align="center")
Filter resources by a specific namespace (example: analyse all resources in the default namespace)
```
k8sgpt analyse --explain -b ollama --namespace default
```
![](/img/blog/k8sgpt-tutorial-when-kubernetes-meets-ai/219e2667-cbc8-4c81-b21f-fc3d3d03175e.png align="center")
Providing multiple filters (example: analyse all pods and services in the default namespace)
```
k8sgpt analyse --explain -b ollama --filter Pod,Service -n default
```
![](/img/blog/k8sgpt-tutorial-when-kubernetes-meets-ai/8719ae4b-ce31-4e23-828a-b79a33c0d08d.png align="center")

Apart from the built-in filter support (which we’ll also expand in the upcoming section), there are two additional flags worth highlighting:

Output JSON Flag

The --output json flag generates the analysis output in JSON format. This is particularly useful when you want to integrate K8sGPT with other tools or automate tasks, as JSON is a widely accepted format for data exchange. By using this flag, you can easily parse and process the output with scripts or software applications, allowing for seamless integration into your existing workflows. Here's how you can use it:
```
k8sgpt analyze --explain --filter=Service --output=json
```
Anonymize Flag

If you’re concerned with providing sensitive data about your workloads to OpenAI or other AI backends, the --anonymize flag is useful for you. When used with the k8sgpt analyse command, this flag masks sensitive data such as Kubernetes object names and labels before sending it to the AI backend for analysis.

During the analysis, K8sGPT retrieves sensitive data, which is then masked before being sent to the AI backend. The backend receives the masked data, processes it, and returns a solution to the user. Once the solution is returned to the user, the masked data is replaced with the actual Kubernetes object names and labels. Here’s how you can use it:
```
k8sgpt analyze --explain --filter=Service --output=json --anonymize
```
![](/img/blog/k8sgpt-tutorial-when-kubernetes-meets-ai/36d7c041-a98f-48af-8724-bcbb464315b0.png align="center")

Note: If you wish to know more options to use with k8sgpt analyze command, you can use the following to view some more configurable options:
k8sgpt analyze help

Integrations #

When we talk about any tool in the cloud-native ecosystem, the main value lies in how well it integrates with other tools from the CNCF Landscape.

Despite being an early-stage project, K8sGPT offers useful integrations that enhance its default analysis capabilities. These integrations provide additional features for scanning, diagnosing, and triaging issues in Kubernetes clusters.

In this section, we’ll focus on the Kyverno integration, which was released in the latest v0.3.39 (at the time of writing this blog).

Note: To follow along, ensure you are on the latest version of the K8sGPT CLI.

To get started, use the following command to list all the available integrations:

k8sgpt integrations list

# Output
Active:
Unused: 
> trivy
> prometheus
> aws
> keda
> kyverno

For the Kyverno integration to work, we need to install Kyverno in our cluster. Use the following commands to install Kyverno via helm:

helm repo add kyverno https://kyverno.github.io/kyverno/
helm repo update
helm install kyverno kyverno/kyverno -n kyverno --create-namespace

After the installation, apply a simple validation policy that ensures a label called team is present on every Pod. Use the following YAML manifest to apply the policy to your Kubernetes cluster:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-labels
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-team
      match:
        any:
          - resources:
              kinds:
                - Pod
      validate:
        message: "label 'team' is required"
        pattern:
          metadata:
            labels:
              team: "?*"

After this, if you try to create a Pod without the team label, the operation won’t be allowed!

![](/img/blog/k8sgpt-tutorial-when-kubernetes-meets-ai/63a824ee-fe4a-4613-aedd-733d8ebf5bc6.png align="center")

Once Kyverno is installed and verified, activate it as a K8sGPT integration using the following command:

k8sgpt integrations activate kyverno

![](/img/blog/k8sgpt-tutorial-when-kubernetes-meets-ai/144e6d76-c133-4c2c-963a-cc2c2dbc1b99.png align="center")

With the Kyverno integration, we get two new filters:

PolicyReport
ClusterPolicyReport

You can use the following command to get a list of updated filters:

k8sgpt filters list

![](/img/blog/k8sgpt-tutorial-when-kubernetes-meets-ai/90b069df-fb21-4358-95e2-526cacc88a13.png align="center")

We can now use one of these new filters as part of the k8sgpt analyze command to filter out relevant information about Kubernetes resources. Below is an example using the PolicyReport filter:

k8sgpt analyze -b ollama --filter PolicyReport

![](/img/blog/k8sgpt-tutorial-when-kubernetes-meets-ai/927b1afc-eb48-401f-a70b-65779e8d641a.png align="center")

It’s pretty similar to using the good old: kubectl get policyreport command — as both retrieve information about policy compliance in our Kubernetes cluster, the only difference being, now we also get solutions to fix the issues!

![](/img/blog/k8sgpt-tutorial-when-kubernetes-meets-ai/d53861d4-9e41-443c-a373-04728e9b0b87.png align="center")

Note: At the time of writing this article, the Kyverno integration is newly added and is at its initial stages (check out the PR), so you may encounter some issues. Feel free to open an issue if you encounter any problems while working with it!

It's important to note that the example above is shown in a demo scenario, which might not showcase its full potential. However, imagine a production cluster with thousands of pods running — the accessibility and insights provided by this kind of integration would be invaluable in such a scenario!

In-cluster Operator #

The K8sGPT CLI is a great way to get started and perform ad-hoc scans in a cluster, but it involves manually running a command for each analysis. What if you want continuous 24/7 scans in the cluster? For this, you can install K8sGPT as a Kubernetes operator, which runs as a Kubernetes Custom Resource and produces reports stored in your cluster as YAML manifests.

These are all the different components that the k8sGPT operator installs and manages:

![](/img/blog/k8sgpt-tutorial-when-kubernetes-meets-ai/6b2a541b-876d-4f9e-bb05-7fc0dafd4a2f.png align="center")

To learn more about installing and setting up the operator, check out the documentation.

Note: The K8sGPT Operator can be customized by modifying the values.yaml file. To check the available customizable options, refer to the documentation.

Advance Features to Try Out #

Let us take a look at two exciting features of K8sGPT that you should definitely try out.

Custom Analyzers #

Want to make k8sGPT to analyze your Kubernetes cluster in ways specific to your environment? K8sGPT allows you to create your own custom analyzers to fit your specific needs. This feature lets you extend the capabilities of K8sGPT by writing your own code to analyze your Kubernetes cluster in ways that are unique to your environment. Whether you need to check for specific configurations or monitor custom metrics, custom analyzers give you the flexibility to tailor the tool to your requirements.

To know more about how to create and use custom analyzers, refer the custom analyzers guide.

Slack Integration #

Does your team operate via Slack? K8sGPT also offers seamless integration with Slack, making it easier to get notifications and updates directly in your Slack channels. This feature ensures that your team stays informed about the health and status of your Kubernetes clusters without having to leave their communication platform. Setting up Slack integration allows you to receive real-time alerts and insights, helping your team to respond quickly to any issues.

Learn more how to set up this integration in the slack integration guide.

These features are certainly powerful additions to K8sGPT, enhancing its functionality and making it even more useful for managing your Kubernetes clusters.

Ending Thoughts #

K8sGPT is a powerful tool that brings the capabilities of AI to Kubernetes management, making it easier to diagnose and resolve issues, automate routine tasks, and gain valuable insights. With features like custom analyzers and Slack integration, you can tailor the tool to fit your specific needs and keep your team informed in real-time.

We've covered the essentials, from installation to advanced features. Now, it's your turn to explore and see how K8sGPT can simplify your Kubernetes operations. Give it a try, and don't hesitate to share your experiences or join the community for further discussions.

Thank you for reading, and happy troubleshooting!

Resources #

Follow Kubesimplify on Hashnode, Twitter/X and LinkedIn. Join our Discord server to learn with us!

Kunal Verma

Kunal is a DevOps and Cloud Native Advocate with a passion for Open Source. He's been involved in the DevOps and open-source ecosystem for 1.5+ years and has a strong experience in public speaking, community management, content creation etc. He has experience working on and contributing to some of the major projects in the CNCF, including Kubernetes, Layer5 and others. He always strives to empower others with my experiences and believes in growing with the community!

Get new posts in your inbox.

Subscribe →

Spotted a typo or want to improve this post? Edit on GitHub →

Discussion

Day 6: Run an LLM on Your Laptop - With Docker

10 min read

Docker MCP Catalog: Finding the Right AI Tools for Your Project

6 min read

kubernetes

How Kubernetes EndpointSlices Actually Work (and Why Endpoints Had to Die)

12 min read