Blue-Green Deployment
Maintain two identical production environments. Deploy to the inactive environment, verify it, then switch all traffic at once. Provides instant rollback by switching back to the previous environment.
Concept
In a blue-green deployment, two environments exist simultaneously: "blue" (current production) and "green" (next release). At any given time, only one environment receives live traffic. The deployment process targets the idle environment, runs validation, and then switches the router or load balancer to direct traffic to the newly updated environment.
The key advantage is that the switchover is atomic from the user's perspective -- there is no period where some requests hit the old version and some hit the new version. If something goes wrong, switching back is equally instantaneous.
Implementation Steps
- 1Provision identical environments. Both blue and green must have the same infrastructure: compute, networking, storage, and configuration. Use infrastructure as code (Terraform, CloudFormation, Pulumi) to guarantee parity.
- 2Deploy to the idle environment. Push the new application version to the environment that is not receiving traffic. Run database migrations if needed (see database handling below).
- 3Run smoke tests and validation. Execute automated tests against the green environment using its direct URL (not through the public load balancer). Verify health endpoints, critical user flows, and integration points.
- 4Switch traffic. Update the load balancer, DNS record, or router configuration to point to the green environment. This is the cutover moment.
- 5Monitor and keep blue warm. After switching, monitor the green environment closely. Keep the blue environment running for a defined stabilization window (30 minutes to 24 hours) to enable instant rollback.
- 6Decommission or recycle. Once the stabilization window passes, the former blue environment becomes the next green target for the following deployment.
Load Balancer Configuration
The load balancer is the switch mechanism. The two common approaches:
Target group swap
Both environments register as separate target groups behind the same load balancer. The switch changes which target group the listener forwards to. This is the fastest approach (sub-second cutover) and is natively supported by AWS ALB, GCP HTTP(S) Load Balancer, and Azure Application Gateway.
DNS-based switch
Each environment has its own load balancer. A DNS CNAME or weighted record switches between them. This is simpler to set up but slower -- DNS TTL propagation means some clients may still hit the old environment for minutes. Set TTL to 60 seconds or less before deploying.
Database Migration Handling
Databases are the hardest part of blue-green deployments because both environments typically share the same database. Schema changes must be backward-compatible.
| Pattern | Approach | Rollback Safety |
|---|---|---|
| Expand-contract | Add new columns/tables first (expand). Deploy green. Later remove old columns (contract). | High -- blue still reads old columns |
| Shared database | Both environments use the same database. All migrations must be backward-compatible. | Medium -- requires careful schema design |
| Separate databases | Each environment has its own database. Requires data replication between them. | High -- but complex replication setup |
Rollback Procedure
- 1. Detect the issue. Monitoring alerts fire on error rate spike, latency increase, or failed health checks.
- 2. Switch traffic back. Revert the load balancer listener or DNS record to point to the blue environment. With target group swap, this takes under 5 seconds.
- 3. Verify blue is serving. Confirm through synthetic monitors and real traffic metrics that the blue environment is healthy.
- 4. Diagnose on green. The failed green environment is still running -- investigate logs and metrics without time pressure.
Cloud Provider Support
| Provider | Native Service | Switch Mechanism | Switch Time |
|---|---|---|---|
| AWS | CodeDeploy (ECS/EC2), Elastic Beanstalk | ALB target group swap, Route 53 weighted | <5 seconds (ALB), 60s+ (DNS) |
| GCP | Cloud Deploy, Cloud Run revisions | Traffic splitting on Cloud Run, URL map backend swap | <5 seconds (traffic split) |
| Azure | App Service deployment slots, Traffic Manager | Slot swap (warm), Traffic Manager profile | <10 seconds (slot swap) |
| Kubernetes | Service selector swap, Istio traffic shifting | Update Service selector labels or VirtualService route | <2 seconds (selector), <5s (Istio) |