As a container orchestration tool, Kubernetes (K8S) ensures your applications are always running. However, applications can fail due to many reasons.
During such failures, Kubernetes tries to self-heal the application by restarting it, thanks to Pod restart policies.
We can take the term self-heal with a pinch of salt, as K8S can only attempt to restart the application; it can’t solve the application issue/ error. But where it’s a transient error that a simple restart can fix, this feature is handy.
In this article, we will practically understand how the restart policy works and how its values behave in certain conditions.
Here’s a quick summary for you.
Table of Contents
Resource Repo
To follow hands-on exercises, clone this git repo and change the directory to restartPolicy
.
> git clone https://github.com/decisivedevops/blog-resources.git
> cd blog-resources/k8s/restartPolicy
Always
restartPolicy: Always
ensures that a container is restarted whenever it exits, regardless of the exit status.
Always
is the default restart policy for the pods created by controllers like Deployment, ReplicaSet, StatefulSet, and DaemonSets.
Let’s test it practically.
- Apply the
deployment.yml
> kubectl apply -f deployment.yml
deployment.apps/busybox-deployment created
- This simple busybox deployment starts a pod, runs a while loop, sleeps for 5 seconds, and exits with a non-zero exit code.
- Get the pod status.
> kubectl get pod
NAME READY STATUS RESTARTS AGE
busybox-deployment-7c5cfcf9b8-kp5d6 0/1 Error 2 (27s ago) 40
- It restarts right after the loop ends and exits.
- Let’s check the pod restart policy value.
> kubectl get po -l app=busybox -o yaml | grep -i restart
Two points to note here.
- We have not explicitly set the
restartPolicy
spec on the deployment.Always
is the default value. - The container exited with a non-zero status code, so you might think about what will happen if the container actually does its job and exits successfully, i.e., with
0
exit code.
Let’s try this.
- Update the while loop in
deployment.yml
to exit with the status code0
.
- 'while true; do echo "Hello Kubernetes!"; sleep 5; exit 0; done'
- Re-apply the deployment and observe the pod restart count.
> kubectl apply -f deployment.yml
deployment.apps/busybox-deployment configured
> kubectl get po -l app=busybox
NAME READY STATUS RESTARTS AGE
busybox-deployment-7c5cfcf9b8-d5dzj 0/1 Completed 1 (7s ago) 14s
# you can see a differance in the STAUTS column.
# with non-zero exit code, it will be Error.
# with zero exit code, it will be Completed.
- However, it’s still restarting.
Wait, our pod’s work was completed successfully and exited with a proper exit status, so why is it in a restart loop?
This restart behavior is inherent to the working of Deployments to ensure that a specified number of pods are always running.
But what if we want to run a pod, complete the task, and exit? That’s where the other values of restartPolicy
are helpful.
But there’s a catch with those values.
OnFailure and Never
As names suggest,
restartPolicy: OnFailure
ensures that a pod is restarted only when it fails OR exits with non-zero exit stats.restartPolicy: Never
does not restart the pod once it fails or exits.
Demo time.
- Update the
deployment.yml
to includerestartPolicy
asOnFailure
.
spec:
containers:
- name: busybox
image: busybox
args:
- /bin/sh
- -c
- 'while true; do echo "Hello Kubernetes!"; sleep 5; exit 0; done'
restartPolicy: OnFailure
- Re-apply the deployment.
> kubectl apply -f deployment.yml
The Deployment "busybox-deployment" is invalid: spec.template.spec.restartPolicy: Unsupported value: "OnFailure": supported values: "Always"
- Typo maybe? It doesn’t look like it as it says
supported values: “Always”
. What happened to the other two then? - Turns out, values
OnFailure
andNever
are only applicable when,
– The pod is launched on its own (i.e., without controller like Deployment, ReplicaSet, StatefulSet, and DaemonSets), and
– When a pod is started by a Job (including CronJob). - If you want to dive into the conversation about whys and hows about this, there is a long-running discussion thread on Github.
To get our task done of starting the pod, getting it to complete the job, and exiting, and not restarting, we need a Job controller.
job.yml
defines a simple Job that starts a pod, prints out a string, and exits. It hasrestartPolicy
asOnFailure
.- Apply
job.yml
.
> kubectl apply -f job.yml
job.batch/busybox-job created
- Check the pod status.
> kubectl get pod --field-selector=status.phase=Succeeded
NAME READY STATUS RESTARTS AGE
busybox-job-hn69d 0/1 Completed 0 1m
- Now, it’s not restarted after completion.
You can test the below configurations to get further idea about how restartPolicy
works for Job.
- Update the
job.yml
to exit the pod with a non-zero status code.
command: ["/bin/sh", "-c", "echo Hello Kubernetes! && exit 1"]
> kubectl delete -f job.yml
> kubectl apply -f job.yml
# ------
# pod should be in restart loop
> kubectl get pod
2. Update the restartPolicy
to Never
and check the pod status.
Conclusion
A few people pointed out on this thread that having values Never
and OnFailure
is helpful in cases where we need to debug a pod that is having issues only in a Kubernetes environment.
Also, one point ( this isn’t very clear) is the official K8S documentation does not clearly state the working of these different values yet.
Until we have some clarification on these values from the official K8S team, here are a few key takeaways.
- Restart Policies Define Pod Resilience: The
Always
restart policy, the default for Pods managed by deployments and other controllers ensures continuous availability by restarting containers regardless of their exit status. This policy is required for applications that must be available at all times. - Context-Specific Policy Usage: While
Always
is suitable for ongoing applications,OnFailure
andNever
restart policies are specifically helpful for standalone Pods (a rare case) or those managed by Jobs and CronJobs, catering to batch jobs and tasks that should not restart once completed. - Deployment Limitations and Job Flexibility: Deployments enforce the
Always
policy to maintain service availability, highlighting a limitation for use cases requiring no restarts. Jobs offer flexibility, allowing forOnFailure
andNever
policies to ensure tasks run to completion as intended without unnecessary restarts.