Kubernetes Probes: Check The Pulse of Your Pods

Practically learn how to use Kubernetes liveness, readiness, and startup probes to ensure application health and reliability.

Last updated:

Kubernetes has three probes to track containers’ health: Liveness, Readines, and Startup probes. In this article, you’ll understand these probes and their differences and, most importantly, learn how to implement them practically in your projects.

Here’s a quick summary for you.


Table of Contents


Liveness Probes

These probes check if the container is running as expected, meaning that the application inside the container is functioning correctly and not experiencing any critical issues. This could include scenarios such as:

  • The application is stuck in an infinite loop or deadlock.
  • The application is not responsive.

If a liveness probe fails, the kubelet terminates the container and initiates a restart according to the Pod’s restartPolicy

Readiness Probes

These probes check if the container is ready to serve traffic. They ensure the container has finished all necessary startup tasks and can handle incoming requests.

When a readiness probe fails, the Endpoints controller removes the Pod’s IP address from the Service’s endpoints, stopping the traffic sent to the unhealthy container.

Readiness probes are essential for applications that take time to start up or have external dependencies.

Startup Probes

These probes check if the container has started successfully. They are handy for containers that have a lengthy startup process.

The kubelet executes the startup probe during the container’s initialization phase. Once the startup probe succeeds, Kubernetes only begins running the liveness and readiness probes. If a startup probe fails, the kubelet terminates the container and restarts it according to the Pod’s restartPolicy.

Startup probes help prevent premature liveness or readiness probe failures during the container startup phase.


Difference between Readiness and Startup probes

Readiness probes determine if the container is ready to serve traffic “throughout its lifetime”. In contrast, startup probes ensure the container “has started successfully“ before other probes monitor its health.


Difference between Liveness and Readiness Probes

To understand this difference, let’s consider the below scenario.

Imagine you have a web application running in the container. This application relies on a third-party API to function correctly.

Liveness Probe Scenario

  • After running for several hours, your application experiences a slowdown due to a bug in the code.
  • This slowdown causes your application to become unresponsive and unable to handle new requests.
  • The liveness probe detects that the application is not responding and triggers the container restart.
  • After the restart, the application recovers and functions normally.

Readiness Probe Scenario

  • Your application runs normally, but the third-party API it depends on becomes unavailable.
  • The readiness probe detects that the application cannot reach the third-party API and stops sending traffic to the container.
  • Kubernetes continues to run the container but does not send any new requests until the third-party API becomes available again.
  • Once the third-party API is back online, the readiness probe succeeds, and Kubernetes resumes sending traffic to the container.

The key difference is that liveness probes monitor the internal health of your application and will restart the container if issues are detected. In contrast, readiness probes monitor the readiness of your application to serve traffic without restarting the container.


Enough theory. Let’s implement what we learned so far.

Let’s implement what we learned so far

Notes

Repository

> git clone https://github.com/decisivedevops/blog-resources.git

> cd blog-resources/k8s/probes

Building Docker Images

  • All the examples have their respective Dockerfiles for container image creation.
  • I am using a Kind cluster, so I will build the images locally and push them to the Kind cluster node.
  • Due to this, a local image is used in deployment.yaml. To ensure K8S does not pull the image from an upstream registry like Docker, I am configuring imagePullPolicy: Never. You can learn more about imagePullPolicy here.

Terminal

  • During demos, you can use a screen multiplexer such as Tmux to get a single-view layout for all the different kubectl commands.
  • You can also use multiple terminal windows/ panes.

Examples of Liveness Probes

Example 1: HTTP GET Liveness Probe

In this example, we’ll create a simple Node.js application that exposes an endpoint for health checks and configure a liveness probe using HTTP GET.

  • Change to liveness-probe-example directory.
cd liveness-probe-example
  • This app.js application has two endpoints:
    • /healthz: Returns an HTTP 200 OK response when the application is healthy and an HTTP 500 Internal Server Error response when it’s unhealthy.
    • /make-unhealthy: Simulates an unhealthy state by setting the healthy variable to false.
  • In deployment.yaml, we configure an HTTP GET liveness probe that checks the /healthz endpoint every 5 seconds, starting after an initial delay of 5 seconds.
  • Apply the deployment.
kubectl apply -f deployment.yaml
  • Verify that the Pod is running:
> POD=$(kubectl get pod -l app=liveness-probe-example -o jsonpath="{.items[0].metadata.name}")

> kubectl get pod/${POD}
  • Wait until you see the Pod in the “Running” state.
  • Now, we can simulate an unhealthy state by requesting the /make-unhealthy endpoint.
kubectl exec -it $POD -- curl http://localhost:3000/make-unhealthy
  • Watch the pod events:
 kubectl events --watch --for Pod/$POD
  • You’ll notice that Kubernetes restarted the Pod due to the liveness probe detecting the unhealthy state.

This example demonstrates how to configure an HTTP GET liveness probe and how Kubernetes responds to a failing liveness probe by restarting the container.

In the following example, we’ll explore a different type of liveness probe using a TCP Socket.

Example 2: TCP Socket Liveness Probe

In this example, we’ll create a simple Python application that listens on a TCP port and configure a liveness probe using a TCP Socket.

  • Navigate to tcp-liveness-probe-example directory.
cd tcp-liveness-probe-example
  • app.py Python script starts a TCP server that listens on port 8080 and accepts incoming connections. To simulate the code failure, we’ll close the socket connection after a few seconds once app.py runs.
  • In deployment.yaml, we configure a TCP Socket liveness probe that checks the container’s port 8080 every 5 seconds, starting after an initial delay of 5 seconds.
  • Apply the deployment.
kubectl apply -f deployment.yaml
  • Verify that the Pod is running:
> POD=$(kubectl get pod -l app=tcp-liveness-probe-example -o jsonpath="{.items[0].metadata.name}")

> kubectl get pod/${POD}
  • Once the socket connection is closed, kubelet automatically restarts the container because the TCP Socket liveness probe detects the application no longer listens on the specified port.
kubectl events --watch --for Pod/$POD

This example showcases how to configure a TCP Socket liveness probe and demonstrates Kubernetes’ ability to restart the container when the liveness probe fails.

In the following example, we’ll explore the Exec liveness probe, which executes a command inside the container to determine its health.

Example 3: Exec Liveness Probe

In this example, we’ll create a simple bash script that simulates an application and configure a liveness probe using the Exec probe type.

  • Navigate to exec-liveness-probe-example directory.
cd exec-liveness-probe-example
  • app.sh script starts an application simulation, writes “I am healthy” to file /tmp/healthz, then runs a loop that counts to 59, echoing progress every 2 seconds. At the count of 10, it deletes /tmp/healthz.
  • In deployment.yaml, we configure an Exec liveness probe that runs the command if [ -f /tmp/healthz ]; then exit 0; else exit 1; fi inside the container every 5 seconds, starting after an initial delay of 10 seconds.
  • This conditional probe command checks if the file /tmp/healthz exists. If it does, the script exits with a status code 0 indicating success. If the file does not exist, it exits with a status code 1 indicating failure.
  • The liveness probe fails if the command exits with a non-zero status code.
  • Apply the deployment.
kubectl apply -f deployment.yaml
  • Verify that the Pod is running:
> POD=$(kubectl get pod -l app=exec-liveness-probe-example -o jsonpath="{.items[0].metadata.name}")

> kubectl get pod/${POD}

This example demonstrates how to configure an Exec liveness probe that executes a command inside the container to check the application’s health.

Above examples cover the three main types of liveness probes: HTTP GET, TCP Socket, and Exec. You can choose the appropriate probe type based on your application’s requirements and the specific health check mechanism you want to implement.


Example of Readiness Probe

Example 4: HTTP GET Readiness Probe

In this example, we’ll implement a simple Node.js app with a readiness probe that checks the connectivity to multiple external dependencies (MongoDB and Redis) and the overall readiness of the application.

Here’s a step-by-step breakdown of the example:

Node.js APP

  • The application connects to MongoDB and Redis during startup.
  • The application exposes a /ready endpoint for the readiness probe.
  • When the /ready endpoint is called, it performs the following checks:
    • Checks the connectivity to MongoDB by executing a ping command.
    • Checks the connectivity to Redis by executing a ping command.
    • If either of the checks fails, the /ready endpoint returns a “Not Ready” status.
    • If both checks pass, the /ready endpoint returns a “Ready” status.
  • The application also exposes a /hostname endpoint that returns the hostname of the pod. We will use this endpoint to observe the pod traffic.

Deployment

  • The deployment.yaml defines deployments for the application, MongoDB, and Redis.
  • The application deployment includes a readiness probe that calls the /ready endpoint periodically to check the application’s readiness.
  • Services are defined for MongoDB and Redis applications to enable communication between them.

Simulating Probe Failures

  • We will simulate a failure scenario where the connectivity to Redis is lost:
    • The Redis pod is deleted to simulate a failure.
    • The readiness probe fails, and the application logs show Redis connection errors.
    • The pod is removed from the service endpoints and has no new traffic.
  • When the Redis pod is restored:
    • The Redis connection is re-established.
    • The readiness probe succeeds, and the application becomes ready.
    • The pod is re-added to the service endpoint list.
    • Traffic is reaching the pod.

  • Navigate to comprehensive-readiness-probe directory.
cd comprehensive-readiness-probe
  • You can go through the app.js and deployment.yaml to get a better understanding of the example app.
  • Apply the deployment.
kubectl apply -f deployment.yaml
  • Verify that the Pod is running:
> POD=$(kubectl get pod -l app=comprehensive-readiness-probe -o jsonpath="{.items[0].metadata.name}")

> kubectl get pod/${POD}
  • Now that the pods are running, let’s confirm a few points we discussed earlier.
  • Let’s check the service endpoints for each service.
> kubectl get endpoints

NAME                                    ENDPOINTS                           AGE
comprehensive-readiness-probe-service   10.244.0.69:3000,10.244.0.71:3000   2m34s
mongodb-service                         10.244.0.68:27017                   2m34s
redis-service                           10.244.0.70:6379                    2m34s
  • We can see two endpoints for comprehensive-readiness-probe-service service, i.e., two pods.
  • Let’s check the pod logs.
> kubectl logs $POD

Cache connection is active
Application is ready
Received readiness probe request
Database connection is active
Cache connection is active
Application is ready
Received readiness probe request
Database connection is active
Cache connection is active
Application is ready
  • To use the /hostname endpoint, we will use another awesome tool (one of my favorites), Netshoot.
kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot
  • Once the Netshoot container shell starts, execute the below curl request that will reach out to app.js container, returning the hostname of the pod.
while true; do curl http://comprehensive-readiness-probe-service/hostname && echo &&  sleep 1 ; done;

  • Great. Now we have our setup ready, let’s make the app.js pod unhealthy and observe what happens.
  • To fail the readiness probes, we’ll update the count of the Redis pod to 0.
kubectl scale deployment redis --replicas 0
  • As soon as the Redis connectivity is lost,
    • You should see the connection failure errors on the pod logs.
    • The pod events should report the readiness probe failures.
    • The Netshoot curl request should also report connection errors.
  • Let’s look at why Netshoot curl is giving Couldn’t connect to server error.
    • If you look at the comprehensive-readiness-probe-service endpoint, there is no endpoint present.
    • Since the pod is not passing its readiness probe, it’s removed from the service endpoint.
> kubectl get endpoints comprehensive-readiness-probe-service

NAME                                    ENDPOINTS   AGE
comprehensive-readiness-probe-service               150m

But did you noticed it yet? Unlike liveness probe failures, even after numerous readiness probe failures, the container is still not restarted.

> kubectl get pod -l app=comprehensive-readiness-probe

NAME                                             READY   STATUS    RESTARTS   AGE
comprehensive-readiness-probe-7d4d7cf464-dw6xs   0/1     Running   0          146m
comprehensive-readiness-probe-7d4d7cf464-sgbzd   0/1     Running   0          146m
  • Now, let’s restore the Redis pod and observe the curl running in the Netshoot container.
kubectl scale deployment redis --replicas 1
  • Once the Redis pod starts, the /ready probes can reach the Redis endpoint,
    • The app.js pod error should be resolved.
    • The Netshoot curl should return the hostname of the pods correctly.
    • comprehensive-readiness-probe-service endpoint should have both pods’ endpoints.
> kubectl get endpoints comprehensive-readiness-probe-service

NAME                                    ENDPOINTS                           AGE
comprehensive-readiness-probe-service   10.244.0.69:3000,10.244.0.71:3000   155m

The key takeaways from this example are:

  • By implementing a readiness probe, you can prevent traffic from being sent to pods that are not fully ready or have issues with their dependencies.
  • When a dependency fails, the readiness probe fails, and Kubernetes stops sending traffic to the affected pods until the issue is resolved.
  • The readiness probe will succeed once the dependencies are restored, and pod traffic will resume.

Example of Startup Probe

Example 5: Startup Probe with Liveness and Readiness Probes

In this example, we’ll create a Node.js application that simulates a long startup process and configure a startup probe, liveness, and readiness probes to demonstrate their behavior.

  • Navigate to the startup-probe-example directory.
cd startup-probe-example
  • In this example application:
    • The /ready endpoint remains the same, handling the readiness probe.
    • The /healthz endpoint remains the same, handling the liveness probe.
    • A new /startup endpoint is added to handle the startup probe. It returns a 200 status code if the application has started successfully and a 503 status code if the application is still starting up.
    • The application simulates a startup delay based on the STARTUP_DELAY environment variable (default is 30 seconds).
  • In deployment.yaml,
    • The startup probe allows the container to start up and initialize for up to 30 failures with a 5-second interval between each probe.
    • Once the startup probe succeeds, the liveness and readiness probes should take over, checking the health and readiness of the container every 5 seconds, with an initial delay of 10 seconds.
  • Apply the deployment.
kubectl apply -f deployment.yaml
  • Verify that the Pod is running:
> POD=$(kubectl get pod -l app=startup-probe-example -o jsonpath="{.items[0].metadata.name}")

> kubectl get pod/${POD}
  • Now, let’s observe the pod logs.
kubectl logs -f $POD

  • A few points to note from logs,
    • Despite having initialDelaySeconds of 10 seconds, the readiness and liveness probes were not active until the startup probe succeeded.
    • You can test this behavior by commenting out the startup probe section from deployment.yaml and re-applying it.
    • If you reduced failureThreshold to just 1, the container will keep restarting due to startup probe failures.

As we learned earlier, in scenarios where your application has a prolonged startup process, using a startup probe in combination with liveness and readiness probes ensures that the application is given sufficient time to initialize while still being monitored for health and readiness.

Finetuning Probes

While the default probe settings provide a good starting point, there may be situations where you need to fine-tune the probe parameters to suit your application’s specific requirements better.

Fine-tuning your probes allows you to:

  • Avoid unnecessary container restarts or traffic disruptions
  • Ensure your application is truly ready to serve traffic
  • Optimize probe frequency and timeout values
  • Handle application startup and shutdown gracefully

We have already used a couple of these parameters in the above examples. Here is a quick summary of the rest of the parameters you can adjust to fine-tune your probes:

Conclusion

Throughout this article, we explored the different types of probes and their use cases. Using practical examples, we learned how to configure probes and saw how Kubernetes responds to probe failures by restarting containers or stopping traffic to unhealthy pods.

Remember, probes are not limited to HTTP, TCP, or command-based checks. Kubernetes also supports gRPC probes, allowing you to leverage the gRPC Health Checking Protocol for more advanced health monitoring scenarios.

I encourage you to experiment with different probe configurations and explore how they can improve the reliability and scalability of your applications. Test various failure scenarios, monitor probe behavior, and iterate on your configurations to find the optimal settings for your use case.


🙏 I am grateful for your time and attention all the way through!
Let me know your thoughts/ questions in the comments below.

If this guide sparked a new idea,
a question, or desire to collaborate,
I’d love to hear from you:
🔗 E-Mail
🔗 LinkedIn
🔗 Upwork

Till we meet again, keep making waves.🌊 🚀

Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x