How to optimize graceful shutdown in Kubernetes and avoid customer impact

Shay Dratler
5 min readApr 18


created by night cafe

Many applications and services leverage Kubernetes infrastructure from day one. Yet, the out-of-the-box solution might not allow you to finetune the termination process. Yes, there is a graceful shutdown flow within Kubernetes, but it doesn’t take into account incoming transactions. Therefore, it can cause an end user to wait for a long time just to get a 5XX error (or worse!). Creating an application that is working and durable is important, but it is equally important to consider its termination.

Why do bad terminations occur?

A service, user, timer, or other entity sends a kill signal or shutdown signal (SIGTERM*) to the Kubernetes pod. Meanwhile, the endpoint sends traffic to the pod application server during the termination flow. As I mentioned, the Kubernetes solution is not aware of incoming traffic, so any open requests might get lost.

  • We will discuss SIGTERM as the kill signal, but there are many types of kill signals. The SIGTERM signal is a generic signal used to cause program termination. Unlike SIGKILL, this signal can be blocked, handled, and ignored. It is the normal way to politely ask a program to terminate. The shell command kill generates SIGTERM by default.

Naive approach

Kubernetes has a solution for it:


This configuration is part of the deployment. When adding this configuration, you can define or change the number of seconds that the pod will need to wait before being terminated.

Same as before, we are sending the kill signal, but this time the pod application server will use the terminationGracePeriod definition written within the deployment and will wait the number of seconds that we defined.

Unfortunately, this is 100% error prone. What if you are using a synchronous approach and some requests are handled in more than the defined number of seconds, causing a timeout? This could result in requests with no replay that will get a gateway timeout or a server not accessible error (503 or 504 errors).

For example, when pods autoscale, a mechanism is being used to add additional required capacity to avoid overload on the pods and outage. As a result, the kill signal eradicates un-needed pods.

End-to-end solution approach

In order to provide a more stable solution, we can leverage both terminationGracePeriod for the entire pod command and preStop hook for the application container. This results in a faster configuration, giving the pod some more graceful time to drain the connections.

       # application container
- /bin/bash
- -c
- sleep 20

# for entire pod
terminationGracePeriodSeconds: 160 # just arbitrary number

Now we need to make some changes to the application so that it will know that it’s about to be terminated.

On SpringBoot, for example:

spring.lifecycle.timeout-per-shutdown-phase=X Seconds

In this example, when the kill signal is sent to the application container, Spring will start a graceful shutdown by itself. For that use case, it will wait for an additional time of X seconds.

You can also do it manually using Spring pre-Destroy Bean or interface Bean sample:

public class DestoryBean {

public void destroyMethod() {
//destroy logic goes here

Interface sample:

public class DestoryerBean implements DisposableBean {

public void destroy() throws Exception {
//destroy logic goes here

The same idea but with Node code:

const express = require('express');

const app = express();
// your server code goes here

app.listen(3000, () => console.log('Server is up using port 3000'));

process.on('SIGTERM', () => {
//on close - do staff here

The application is listening and waiting to get SIGTERM, the signal that the application is about to be terminated. From there, we need to work out how to terminate safely. For example, we would mark health check responses as unhealthy, stop consuming new messages from queues like Kafka and SQS, and store our data on persistent storage. The most important part is that we do not lose any data whatsoever.

Same as before, a kill signal will be sent to the pod application server. This time, to handle application closure, the pod will use the preHook and terminationGracePeriod definitions written within the deployment. It will wait the number of seconds on preHook, and the application will listen to SIGTERM. Then the application will wait for the amount of time in terminationGracePeriod as configured in deployment..

This approach will provide an end-to-end solution and will reduce the impact. More importantly, it will provide you with a strategy to handle a pod that is about to be terminated. It’s not 100% bulletproof: Some asynchronous requests might get lost if you are working at a huge scale. But, as I see it, this is the safest way to handle pod termination. In the code options shared above, you can customize the closed processes, fix potential issues, and reduce all known integration points.

Some final words

In my experience, leaving by default is not recommended. If you don’t know what the right values are for you, then start by measuring your traffic. Consider what the slowest response time is, and add or change the value of terminationGracePeriod. If you can add graceful shutdown to the application or listen to SIGTERM before handling the termination process, then you’ll hit the sweet spot.

Also measure how much time the application needs in order to drain requests. This can be found by querying TP95. And, as always, measure, measure, measure. Only then can you fully understand the impact of pod termination and reduce it.



Shay Dratler

Developer, Problem lover, Eager to learn new stuff