Kubernetes Liveness and Readiness probes

To optimize the time in which our application is running and serve our users we need to use Liveness and Readiness probes. Liveness probes ensure that the application is healthy, and the second one that it is ready to receive requests. In the continuation of this article, we will use the microservice application that we used in previous posts to showcase how to use the probes.

 Prerequisite for following along with this article is knowing Kubernetes basics. If it's not the case please read our introduction to Kubernetes and then continue here.

Getting Started

Clone the repository and switch to the branch probesby executing the command below:

$ git clone https://github.com/rinormaloku/k8s-mastery.git  \
&& cd k8s-mastery \
&& git checkout probes

Start the sa-web-app service by applying the following manifests:

$ kubectl apply -f resource-manifests/sa-web-app-deployment.yaml && \ 
kubectl apply -f resource-manifests/service-sa-web-app-lb.yaml

Now we have the application and the service that exposes it, get the external IP by executing:

$ kubectl get svc

And store it in the following variable by replacing the placeholder below (it will be required for the entire article).

$ SA_WEB_APP_IP=<EXTERNAL_IP>

Now we are ready to continue with the Liveness Probe:

Application Health

Let's verify what we get on the endpoint /health of sa-web-app service, by executing the command below:

curl http://${SA_WEB_APP_IP}/health

We provided an endpoint to simplify testing that puts the application gets into a corrupted state:

curl http://${SA_WEB_APP_IP}/destroy

The call to the endpoint destroy turns on a flag to throw exceptions on every subsequent call. Let's verify this:

curl http://${SA_WEB_APP_IP}/health

An internal server error is returned. That's a sign that the application is not healthy, but Kubernetes doesn't restart it. And that's because for Kubernetes there are no health problems. We need to inform Kubernetes on how to do health checks. That's the goal of Liveness Probes.

Liveness Probes Introduction

Liveness probes are defined in the container definition in the PodSpec as shown below:

# removed for brevity
        livenessProbe: 
          httpGet:                # 1
            path: /health         # 2
            port: 8080      
  1. httpGet defines that we are making an HTTP request. Other options are exec and tcp
  2. path defines the path to be called.

The resource manifest for sa-web-app deployment is in [root]/resource-manifests/liveness/sa-web-app-deployment.yaml, apply it by executing:

kubectl apply -f resource-manifests/liveness/sa-web-app-deployment.yaml

Let's kill the container again:

curl http://${SA_WEB_APP_IP}/destroy

Verify that the container restarts by observing the output for approx. 40 seconds:

$ kubectl get pods -w
NAME                          READY   STATUS    RESTARTS   AGE
sa-web-app-5bcc9857d6-5f2tk   1/1     Running   0          3m
sa-web-app-5bcc9857d6-rkxvs   1/1     Running   0          3m
sa-web-app-5bcc9857d6-rkxvs   1/1     Running   1          4m

Kubernetes now is informed when an application is not healthy and performs health checks in the way we defined them. The health check is internally done by the kubelet. Kubelet?!

The kubelet is the agent installed in every node that recieves the PodSpec and makes sure to run the containers and keep them healthy (or restart otherwise).

Health Check Properties

The health checks are configurable with different properties, we can investigate the default configuration by executing the command below:

$ kubectl describe pod <pod_name> | grep Liveness:
    Liveness:       http-get http://:8080/health delay=0s timeout=1s period=10s #success=1 #failure=3

In the output, we can see how the liveness probe is performed.

  • delay=0 - there is no delay in starting the health checks.
  • timeout=1s - after a probe timeout for 1 second.
  • period=10s - try every 10 seconds.
  • failure=3 - after 3 failures mark the pod as Unhealthy.
  • success=1 - Mark the service as Healthy if it passes one successful check.

A sample definition with all of the properties:

        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 2  # delay proeprty
          failureThreshold: 5     # failure property
          timeoutSeconds: 3       # timeout property
          periodSeconds: 10       # period property
          successThreshold: 1     # success property

Question: What happens if an application takes 1 minute to get started? Follow up question: Which property will solve the problem?

Graceful Shutdown Period

In the file, [root]/resource-manifests/liveness/sa-web-app-deployment.yamlwe squeezed another important line that we want to take a closer look at:

terminationGracePeriodSeconds: 60

This informs the Kubelet to give our application 60 seconds of time to gracefully shutdown, before killing the process. For applications with state and or long running transactions, this is very important.

The termination cycle goes as follows:

Pod container lifecycles

  • The section 'Unhealthy graceful shutdown' represents the time of terminationGracePeriodSeconds which is counted from the moment the application is unhealthy. When defining the value for terminationGracePeriodSeconds you need to take into account the period of time it takes for the preStop Hook to complete. If it takes longer than the graceful shutdown the SIGKILL command will be executed before it going to completion.

Kubernetes doesn't have to wait for the entire termination grace period if the application terminates earlier then the process is killed.

The applications need to be updated to properly handle the SIGTERM signal. Going into this is out of the scope of this article.

Readiness Probes

Liveness Probes ensure that the application is healthy (and restarted when not) meanwhile Readiness Probes ensures when it;s the correct time to forward requests to the application.

To put it simply:

  • Liveness Probes optimize the time in which an application is in a healthy state.
  • Readiness Probes optimize the success rate of requests, by not forwarding traffic when the application is not ready.

And due to this difference, we might want to check for different cases, in the readiness check case we want to ensure that the application has access to a Database or an External API.

The application should not receive requests in the following typical scenarios:

  • During the starting phase of the application. For an application to be ready to receive requests it can take minutes.
  • While the application is not healthy. There is a period of graceful shutdown for the application where it is still Running but it should not receive requests.

Sounds good let's apply the following readiness probe:

        readinessProbe:
          httpGet:  
            path: /ready
            port: 8080
          periodSeconds: 5
          failureThreshold: 1
          successThreshold: 1
          timeoutSeconds: 1

By executing the command below:

$ kubectl apply -f resource-manifests/readiness/sa-web-app-deployment.yaml 
deployment.extensions/sa-web-app configured

The deployment is configured with both liveness and readiness probes. Let's put this in a test in the next section.

Liveness and Readiness probe in Action

Perform the following three steps:

  1. Open Terminal #1 to verify pod going into NotReady state when the application goes corrupt:
$ kubectl get pods -w
NAME                          READY   STATUS    RESTARTS   AGE
sa-web-app-5bcc9857d6-5f2tk   1/1     Running   0          3m28s
sa-web-app-5bcc9857d6-rkxvs   1/1     Running   0          3m28s
  1. Open Terminal #2 and execute a get request for the pod's health every 0.5 seconds:
$ watch -n .5 curl http://${SA_WEB_APP_IP}/health
  1. Open Terminal #3 and hit the destroy endpoint, which will put the application in an unhealthy state:
curl http://${SA_WEB_APP_IP}/destroy

Observations:

  • In terminal #2 you will see failures for the service that is unhealthy
  • In terminal #1 you will see that the unhealthy service goes into Not Ready state Ready (0/1) as the readiness probe has failed.
  • In terminal #1 after approx. 30 seconds, the container will restart as the liveness probe has failed and a couple of seconds more and it will switch to 'Ready' state.

When a container gets restarted you usually want to check the logs why the application went unhealthy. You can do this with the following command kubectl logs <pod-name> --previous

Summary

In this article, you learned about Liveness and Readiness probes.

  • How they work together to ensure that the application is healthy and
  • The difference between an Application is in Ready or Not Ready state.
  • How we can configure the kubelets to check for Readiness and Healthiness using the PodSpec.
  • Erstellt am .

Standorte

Region Nord: Hamburg, Emden
Region Mitte: Langenfeld, Offenbach am Main
Region Ost: Leipzig
Region Süd: München, Ulm
Copyright by Orange Networks GmbH