Rolling Updates

Replace application instances incrementally. New pods are created while old pods are terminated in controlled batches. Each batch is health-checked before the next batch proceeds. This is the default deployment strategy in Kubernetes.

maxSurge and maxUnavailable

These two parameters control the pace of a rolling update. They determine how many pods can exist above the desired count (surge) and how many can be unavailable during the update.

Parameter	Default	Accepts	Effect
maxSurge	25%	Integer or percentage	Maximum number of pods created above the desired replica count during update. Higher values speed up the rollout but use more resources.
maxUnavailable	25%	Integer or percentage	Maximum number of pods that can be unavailable during update. Lower values maintain more capacity but slow the rollout.

Conservative (zero-downtime priority)

maxSurge: 1, maxUnavailable: 0. Always maintains full capacity. New pod must be Ready before any old pod is terminated. Slowest but safest.

Balanced (default)

maxSurge: 25%, maxUnavailable: 25%. Good balance between speed and availability. For a 10-replica deployment, up to 3 extra pods and 3 unavailable at any time.

Aggressive (speed priority)

maxSurge: 50%, maxUnavailable: 50%. Fastest rollout at the cost of temporarily reduced capacity and higher resource usage. Use only when you have excess capacity and need fast deployments.

Health Check Configuration

Rolling updates depend entirely on health checks to determine when a new pod is ready to serve traffic and when an old pod can be safely terminated.

Probe Type	Purpose	Typical Endpoint	Recommended Settings
Readiness	Gates traffic to the pod. Pod receives requests only after passing.	/healthz or /ready	initialDelaySeconds: 5, periodSeconds: 5, failureThreshold: 3
Liveness	Restarts the pod if it becomes unresponsive (deadlock, memory leak).	/healthz or /livez	initialDelaySeconds: 15, periodSeconds: 10, failureThreshold: 3
Startup	Delays liveness checks for slow-starting apps. Prevents premature kills.	/healthz	failureThreshold: 30, periodSeconds: 10 (allows up to 5 minutes startup)

Pod Disruption Budgets

A PodDisruptionBudget (PDB) limits the number of pods that can be simultaneously disrupted during voluntary operations (rolling updates, node drains, cluster upgrades). It provides an additional safety layer beyond maxUnavailable.

PDB Setting	Example	Effect
minAvailable	minAvailable: 3	At least 3 pods must always be running. Disruptions are blocked if this would be violated.
maxUnavailable	maxUnavailable: 1	At most 1 pod can be disrupted at a time. Safer for stateful workloads.

Graceful Shutdown

When a pod is terminated during a rolling update, Kubernetes sends a SIGTERM signal. The application must handle this signal to drain in-flight requests before exiting.

1. SIGTERM received. Kubernetes sends SIGTERM to the pod's main process and simultaneously removes the pod from the Service endpoints.
2. Stop accepting new connections. The application should stop listening for new requests (or return 503 on the readiness probe).
3. Drain in-flight requests. Allow existing requests to complete. Set a reasonable drain timeout (e.g., 15-30 seconds).
4. Close connections and exit. Close database pools, flush buffers, and exit with code 0.
5. SIGKILL after grace period. If the process has not exited after terminationGracePeriodSeconds (default 30s), Kubernetes sends SIGKILL. Set this value higher for applications with long-running requests.

Kubernetes Deployment Spec Reference

Key fields in a Kubernetes Deployment spec that affect rolling update behavior.

Field	Path	Default	Notes
strategy.type	spec.strategy.type	RollingUpdate	Alternative: Recreate (terminates all pods first)
maxSurge	spec.strategy.rollingUpdate.maxSurge	25%	Rounded up for percentages
maxUnavailable	spec.strategy.rollingUpdate.maxUnavailable	25%	Rounded down for percentages
minReadySeconds	spec.minReadySeconds	0	Seconds a new pod must be Ready before it counts as available. Set to 10-30 for stability.
progressDeadlineSeconds	spec.progressDeadlineSeconds	600	If rollout makes no progress in this time, it is marked as failed.
revisionHistoryLimit	spec.revisionHistoryLimit	10	Number of old ReplicaSets to retain for rollback.