What this covers
DNS failover design, dual-CDN routing logic, AZ-failure playbooks, and security/ops guardrails for customer-facing workloads.
Implementation trail
- Prerequisites and DNS ownership
- AWS foundation deployment
- Secondary CDN setup
- Failover routing and AZ-failure response
- Operations, validation, and decommissioning
Establish DNS and edge prerequisites
Own the domain in Route 53 (or delegate to it) and stand up the Cloudflare zone with proxying and WAF rules before deploying AWS resources.
- Ensure Cloudflare origin certificates are trusted by the ALB or terminate TLS with ACM on the ALB.
- Collect admin CIDRs and instance types for the ASG ahead of deployment.
- Mirror essential WAF and rate-limit rules between Cloudflare and AWS WAF where possible.
Deploy the AWS foundation
Launch cf_templates/cloudflare-failover-ha.yaml to provision VPC, dual-AZ subnets, ALB, ASG, S3 asset bucket, and CloudFront secondary.
- Attach health checks to a
/health path and verify ALB targets are balanced across AZs. - Keep S3 as a failover origin for static assets to reduce blast radius if compute scales slowly.
- Parameterize desired and max ASG capacity for burst handling.
Wire dual-CDN and DNS failover
Route 53 primary points at Cloudflare; secondary points at the CloudFront distribution that fronts the same ALB origin.
- Use health checks to fail traffic from Cloudflare to CloudFront automatically when the primary path is unhealthy.
- Keep a tertiary ALB alias record disabled by default as an escape hatch.
- Test failover regularly to ensure TTLs and health checks behave as expected.
Handle AZ and CDN failures gracefully
- Enable ALB cross-zone load balancing and raise ASG capacity in the surviving AZ during events.
- Monitor CloudFront and Cloudflare error rates; temporarily prioritize direct-to-ALB routing if both CDNs degrade.
- Confirm ingestion through API Gateway/Lambda or private API targets continues while routing shifts.
Operate and secure the stack
- Track ALB 5xx, ASG capacity, RDS failover events, and CDN health in CloudWatch and Cloudflare analytics.
- Enforce HTTPS end-to-end and restrict SSH/database access; prefer SSM Session Manager and IAM auth.
- Plan decommission steps to clean up DNS, Cloudflare entries, and AWS resources when retiring the stack.