Rolling Updates

Replace application instances incrementally. New pods are created while old pods are terminated in controlled batches. Each batch is health-checked before the next batch proceeds. This is the default deployment strategy in Kubernetes.

maxSurge and maxUnavailable

These two parameters control the pace of a rolling update. They determine how many pods can exist above the desired count (surge) and how many can be unavailable during the update.

ParameterDefaultAcceptsEffect
maxSurge25%Integer or percentageMaximum number of pods created above the desired replica count during update. Higher values speed up the rollout but use more resources.
maxUnavailable25%Integer or percentageMaximum number of pods that can be unavailable during update. Lower values maintain more capacity but slow the rollout.

Conservative (zero-downtime priority)

maxSurge: 1, maxUnavailable: 0. Always maintains full capacity. New pod must be Ready before any old pod is terminated. Slowest but safest.

Balanced (default)

maxSurge: 25%, maxUnavailable: 25%. Good balance between speed and availability. For a 10-replica deployment, up to 3 extra pods and 3 unavailable at any time.

Aggressive (speed priority)

maxSurge: 50%, maxUnavailable: 50%. Fastest rollout at the cost of temporarily reduced capacity and higher resource usage. Use only when you have excess capacity and need fast deployments.

Health Check Configuration

Rolling updates depend entirely on health checks to determine when a new pod is ready to serve traffic and when an old pod can be safely terminated.

Probe TypePurposeTypical EndpointRecommended Settings
ReadinessGates traffic to the pod. Pod receives requests only after passing./healthz or /readyinitialDelaySeconds: 5, periodSeconds: 5, failureThreshold: 3
LivenessRestarts the pod if it becomes unresponsive (deadlock, memory leak)./healthz or /livezinitialDelaySeconds: 15, periodSeconds: 10, failureThreshold: 3
StartupDelays liveness checks for slow-starting apps. Prevents premature kills./healthzfailureThreshold: 30, periodSeconds: 10 (allows up to 5 minutes startup)

Pod Disruption Budgets

A PodDisruptionBudget (PDB) limits the number of pods that can be simultaneously disrupted during voluntary operations (rolling updates, node drains, cluster upgrades). It provides an additional safety layer beyond maxUnavailable.

PDB SettingExampleEffect
minAvailableminAvailable: 3At least 3 pods must always be running. Disruptions are blocked if this would be violated.
maxUnavailablemaxUnavailable: 1At most 1 pod can be disrupted at a time. Safer for stateful workloads.

Graceful Shutdown

When a pod is terminated during a rolling update, Kubernetes sends a SIGTERM signal. The application must handle this signal to drain in-flight requests before exiting.

  1. 1. SIGTERM received. Kubernetes sends SIGTERM to the pod's main process and simultaneously removes the pod from the Service endpoints.
  2. 2. Stop accepting new connections. The application should stop listening for new requests (or return 503 on the readiness probe).
  3. 3. Drain in-flight requests. Allow existing requests to complete. Set a reasonable drain timeout (e.g., 15-30 seconds).
  4. 4. Close connections and exit. Close database pools, flush buffers, and exit with code 0.
  5. 5. SIGKILL after grace period. If the process has not exited after terminationGracePeriodSeconds (default 30s), Kubernetes sends SIGKILL. Set this value higher for applications with long-running requests.

Kubernetes Deployment Spec Reference

Key fields in a Kubernetes Deployment spec that affect rolling update behavior.

FieldPathDefaultNotes
strategy.typespec.strategy.typeRollingUpdateAlternative: Recreate (terminates all pods first)
maxSurgespec.strategy.rollingUpdate.maxSurge25%Rounded up for percentages
maxUnavailablespec.strategy.rollingUpdate.maxUnavailable25%Rounded down for percentages
minReadySecondsspec.minReadySeconds0Seconds a new pod must be Ready before it counts as available. Set to 10-30 for stability.
progressDeadlineSecondsspec.progressDeadlineSeconds600If rollout makes no progress in this time, it is marked as failed.
revisionHistoryLimitspec.revisionHistoryLimit10Number of old ReplicaSets to retain for rollback.