Rollback Strategies

Every deployment needs a rollback plan. The right rollback strategy depends on your deployment method, database state, and the severity of the issue. This guide covers the major rollback approaches and when to use each.

Immediate Rollback (Kubernetes)

Kubernetes retains previous ReplicaSets (controlled by revisionHistoryLimit). Rolling back is a single command that switches the Deployment to use the previous ReplicaSet's pod template.

How it works

The rollback command updates the Deployment's pod template spec to match a previous revision. Kubernetes then performs a rolling update from the current (bad) version to the previous (known-good) version using the same maxSurge/maxUnavailable settings.

Rollback time

Depends on replica count and rolling update settings. For a 10-replica deployment with default settings, expect 2-5 minutes for full rollback. If speed is critical, temporarily set maxUnavailable to a higher value.

Database Rollback Considerations

Application rollbacks are straightforward. Database rollbacks are not. Data written by the new version may not be compatible with the old version's schema expectations.

ScenarioRollback ApproachRisk
Additive-only migration (new column added)Roll back application only. Leave new column in place. Old code ignores it.Low
Column removed or renamedCannot roll back application without restoring the column. Requires a reverse migration.High
Data format changed (e.g., JSON structure)Old code may fail to parse new data. Need data transformation or dual-format support.High
New data written by new versionOld code must handle unexpected data gracefully (unknown enum values, extra fields, new relationships).Medium

Key principle: Always design database migrations to be backward-compatible. If the old version of your application cannot function with the new schema, you do not have a safe rollback path.

Feature Flag Rollback

Feature flags provide the fastest rollback mechanism. No deployment, no pod replacement, no load balancer changes -- just toggle a flag.

  • Speed: Propagation time depends on SDK implementation. SSE/WebSocket: 1-5 seconds. Polling: up to the poll interval (typically 10-60 seconds).
  • Scope: Can disable a specific feature without affecting the rest of the release. More surgical than a full application rollback.
  • Limitation: Only works if the problematic code is behind a flag. Infrastructure issues, dependency failures, and performance regressions in unflagged code cannot be rolled back this way.

Blue-Green Instant Switch

In a blue-green deployment, the previous environment remains running during the stabilization window. Rollback is a load balancer switch back to the previous environment.

MechanismRollback TimeConsideration
ALB target group swap<5 secondsPrevious environment must still be running and healthy
DNS weighted routing60-300 seconds (TTL dependent)Clients may cache DNS; set low TTL before deploy
K8s Service selector<2 secondsBoth Deployments must exist with different labels

Immutable Infrastructure Rollback

In an immutable infrastructure model, servers are never modified after creation. Rollback means deploying the previous machine image or container image.

  • Container images: Every build produces a tagged, immutable image. Rollback means updating the Deployment to reference the previous image tag. Image must still exist in the registry.
  • AMIs / VM images: Launch new instances from the previous AMI and terminate current instances. Slower than container rollback (minutes vs seconds) but equally reliable.
  • Image retention policy: Keep at least the 5 most recent images to allow rollback to any recent version. Configure lifecycle policies in ECR, GCR, or ACR accordingly.

Rollback Decision Matrix

Use this matrix to choose the right rollback approach based on the situation.

SituationRecommended RollbackExpected Recovery Time
Single feature broken, feature is flaggedToggle feature flag off1-60 seconds
Blue-green deployment, within stabilization windowSwitch load balancer back2-10 seconds
Rolling update, no DB migrationKubernetes rollback to previous revision2-5 minutes
Rolling update with additive DB migrationRoll back application only, leave DB as-is2-5 minutes
Destructive DB migration appliedRun reverse migration, then roll back application10-60 minutes (depends on data volume)
Infrastructure-level failure (bad AMI, misconfigured infra)Deploy previous infrastructure version via IaC5-30 minutes