Kubernetes Observability

Status and Conditions

Every Pod has a status, which describes the state of the Pod. Pods enter the following states in this exact order:

Pending: When a Pod is first created, it is in a Pending state. This is when the scheduler tries to find a Node to place the Pod. If the scheduler can’t find a Node to place the Pod on, it will be stuck in a Pending state. To figure out why the Pod is stuck in Pending, run kubectl describe pod <name-of-your-pod>
ContainerCreating: Once the Pod is scheduled, it enters the ContainerCreating state, where the necessary images are pulled and containers are started.
Running: Once all of the containers in a Pod are running, the Pod enters the Running state, which it continues to be until the program completes successfully, or is terminated.

Conditions compliment Pod status. They are an array of true or false values that tell us about the state of a Pod. You can view the conditions of a Pod by running kubectl describe pod. The (truncated) output will look something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
Name:          nginx-abc123-asdf
Namespace:     default
Node:          kubenode2/192.168.1.103
Start Time:    Mon, 18 May, 2020 19:20:39 - 0400
Labels:        app=prod
               color=blue

...

Conditions:
   Type:         Status
   Initialized:  True
   Ready:        True
   PodScheduled: True

Readiness Probes

Here’s a dangerous little scenario that can easily happen to anybody new to K8s. If we were to run kubectl get pods, we may get something like this:

NAME                  READY      STATUS       RESTARTS   AGE
nginx-abc123-asdf     1/1        Running      0          12m

but when navigate to a web page being served by our Pod, we are greeted with this in our browser:

thisSiteCantBeReached

Why does this happen? According to kubectl get pods, our Pod is ready, and its status is Running… What gives? Well, this is caused by erroneously communicating the readiness of an application. Something like Nginx may take a couple of minutes to become accessible. Even though our Nginx Pod is Running, it is not truly ready.

To determine when a Pod is actually ready, we employ Readiness Probes inside the container. As a developer of the appliation, you know what makes the application ready. There are three types of Readiness Probes:

HTTP Readiness Probe: Checks if a specific path is resolvable.

1
2
3
4
5
6
...
spec:
   readinessProbe:
      httpGet:
         path: /api/ready
         port: 8080

This checks if localhost:8080/api/ready is resolvable.

TCP Test: Checks if a particular TCP socket is listening.

1
2
3
4
5
...
spec: 
   readinessProbe:
      tcpSocket:
         port: 3306

This checks to see if localhost:3306 is listening.

Exec Test: Execute a custom script within the container that exits with an exit code of 0.
1 2 3 4 5 6 7
... spec: readinessProbe: exec: command: - cat - /app/is_ready.txt
This checks to see if /app/is_ready.txt exists, and if it does, cat exits with 0.

There are some additional options we can add to our Readiness Probe.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
...
spec:
   readinessProbe:
      tcpSocket:
         port:3306

   # 1
   initialDelaySeconds: 10

   # 2
   periodSeconds: 5

   # 3
   failureThreshold: 8

initialDelaySeconds: describes how long to wait before running a Readiness Probe.
periodSeconds: describes how long to wait between Readiness Probe attempts.
failureThreshold: describes how many failed Readiness Probes to allow before terminating the Pod. By default, a Pod will be terminated after 3 attempts.

Having Pods configured with Readiness Probes that reflect true readiness is important in developing production applications.

Liveness Probes

Let’s say one of your interns deploys a vital web service with this line hidden away:

1
2
3
if today.month == "March" and today.day = 15:
   while True:
      doNothing()

This happens to be your intern’s birthday, and they’ve decided that every year on their birthday, this container is going to seize up. Technically, the application is still running, so there is no reason for Kubernetes to think that there is anything wrong with the application. In fact, Kubernetes thinks the application is alive and well!

This may not be a real world example, but applications can lock up due to unforeseen edge cases and nasty bugs that made it past code review. This is where Liveness Probes become useful. They are pretty similar to Readiness probes — both in yaml syntax, and function — except they run throughout a Pod’s life to ensure the Pod is still… well… alive!

Just like with Readiness Probes, Liveness Probes come in 3 flavors: HTTP Test, TCP Test, and Exec Test. You can also define an initialDelaySeconds, periodSeconds, and failtureThreshold, just as you would for a Readiness Probe. I won’t belabor the configs for each scenario since they’re essentially identical to Readiness Probes, but here’s an example of how you might use a Liveness Probe:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
apiVersion: v1
kind: Pod
metadata: 
   name: my-app
spec:
   containers:
      - name: nginx
        image: nginx
   
   readinessProbe:
      httpGet:
         path: /api/healthy
         port: 8080
      initialDelaySeconds: 15
      periodSeconds: 5
      failureThreshold: 8
   
   livenessProbe:
      httpGet:
         path: /api/healthy
         port: 8080
      initialDelaySeconds: 30
      periodSeconds: 15 
      failureThreshold: 4 

In this case, I’m allowing 4 failed liveness checks, issued every 15 seconds, 30 seconds after the Pod’s birth.

Logging and Monitoring

When you run containers directly on Docker, you can view the logs of a running container via docker logs -f <some-container-id>. We can view the logs of a container running within a Pod using kubectl logs -f <name-of-pod>. But what if we are running a Pod that has multiple containers running with in? Simple! We have to add the name of the container we are targeting like so: kubectl logs -f <name-of-pod> <name-of-container>. This allows us to view the logs of a specific container within the Pod.

Kubernetes comes with a Metrics Server. You can have one Metrics Server per Kubernetes Cluster, but this is an in-memory monitoring solution, which means you cannot view historical performance data. To utilize historic data, you may use Prometheus, the ELK stack, Datadog, Dynatrace, etc. to accomplish this.

If you are running in minikube, you will need to enable the metrics server via:

1
minikube addons enable metrics-server

For all other environments, you will have to deploy the deployment file via:

1
2
3
4
5
# pull the project from git.
git clone https://github.com/kubernetes-incubator/metrics-server.git

# after locating the deployment file...
kubectl create -f deploy/1.8+

You can view resource consumpion of nodes via:

1
2
3
4
5
kubectl top node
#> NAME           CPU(cores)   CPU%    MEMORY(bytes)    Memory%
#> kubemaster     166m         8%      1330Mi           70%
#> kubenode1      35m          4%      1044Mi           50%
#> kubenode2      22m          2%      1048Mi           55%

And you can view resource consuption of Pods via:

1
2
3
4
kubectl top pod
#> NAME          CPU(cores)    CPU%    MEMORY(bytes)    Memory%
#> nginx         166m          8%      1330Mi           70%
#> redis         35m           4%      1044Mi           50%

Status and Conditions#

Readiness Probes#

Liveness Probes#

Logging and Monitoring#

Status and Conditions

Readiness Probes

Liveness Probes

Logging and Monitoring