Back to Playbooks
Resilience & Edge

Cloudflare-fronted app with CloudFront and AZ failover

Layer Cloudflare with Route 53 health checks and CloudFront so customer apps stay reachable even if a CDN or AZ fails.

What this covers

DNS failover design, dual-CDN routing logic, AZ-failure playbooks, and security/ops guardrails for customer-facing workloads.

Implementation trail

  • Prerequisites and DNS ownership
  • AWS foundation deployment
  • Secondary CDN setup
  • Failover routing and AZ-failure response
  • Operations, validation, and decommissioning

Establish DNS and edge prerequisites

Own the domain in Route 53 (or delegate to it) and stand up the Cloudflare zone with proxying and WAF rules before deploying AWS resources.

  • Ensure Cloudflare origin certificates are trusted by the ALB or terminate TLS with ACM on the ALB.
  • Collect admin CIDRs and instance types for the ASG ahead of deployment.
  • Mirror essential WAF and rate-limit rules between Cloudflare and AWS WAF where possible.

Deploy the AWS foundation

Launch cf_templates/cloudflare-failover-ha.yaml to provision VPC, dual-AZ subnets, ALB, ASG, S3 asset bucket, and CloudFront secondary.

  • Attach health checks to a /health path and verify ALB targets are balanced across AZs.
  • Keep S3 as a failover origin for static assets to reduce blast radius if compute scales slowly.
  • Parameterize desired and max ASG capacity for burst handling.

Wire dual-CDN and DNS failover

Route 53 primary points at Cloudflare; secondary points at the CloudFront distribution that fronts the same ALB origin.

  • Use health checks to fail traffic from Cloudflare to CloudFront automatically when the primary path is unhealthy.
  • Keep a tertiary ALB alias record disabled by default as an escape hatch.
  • Test failover regularly to ensure TTLs and health checks behave as expected.

Handle AZ and CDN failures gracefully

  • Enable ALB cross-zone load balancing and raise ASG capacity in the surviving AZ during events.
  • Monitor CloudFront and Cloudflare error rates; temporarily prioritize direct-to-ALB routing if both CDNs degrade.
  • Confirm ingestion through API Gateway/Lambda or private API targets continues while routing shifts.

Operate and secure the stack

  • Track ALB 5xx, ASG capacity, RDS failover events, and CDN health in CloudWatch and Cloudflare analytics.
  • Enforce HTTPS end-to-end and restrict SSH/database access; prefer SSM Session Manager and IAM auth.
  • Plan decommission steps to clean up DNS, Cloudflare entries, and AWS resources when retiring the stack.

Why this approach

  • Cost-effective resilience: Single origin infrastructure serves both CDNs, avoiding duplication costs
  • Automatic recovery: Route 53 health checks eliminate manual intervention during CDN outages
  • Security continuity: AWS WAF mirrors Cloudflare rules to maintain protection during failover
  • Performance optimization: Cloudflare provides superior global performance while CloudFront ensures availability
  • Operational simplicity: Same deployment pipeline serves both edge networks without complex coordination

Alternative approaches

Multi-CDN with separate origins

Deploy identical applications behind each CDN with independent infrastructure

Pros:

  • Complete failure isolation
  • Independent scaling
  • No shared bottlenecks

Cons:

  • 2x infrastructure costs
  • Complex deployment coordination
  • Data consistency challenges

When to consider:

Enterprise applications with strict availability SLAs and budget for redundant infrastructure

Single CDN with multi-region origins

Use one CDN with origins in multiple AWS regions for geographic redundancy

Pros:

  • Simpler CDN management
  • Geographic distribution
  • Lower complexity

Cons:

  • CDN single point of failure
  • Cross-region latency
  • No protection against provider outages

When to consider:

Applications primarily concerned with regional AWS outages rather than CDN provider issues

Edge-native serverless (CloudFlare Workers + AWS Lambda@Edge)

Deploy application logic at the edge using serverless compute

Pros:

  • Ultra-low latency
  • Distributed compute
  • Auto-scaling

Cons:

  • Vendor lock-in
  • Limited runtime environments
  • Complex debugging
  • Higher costs at scale

When to consider:

Lightweight applications with simple logic that benefit from edge compute

Contextual factors

  • Global SaaS applications require consistent sub-200ms response times across continents
  • API-heavy workloads need different resilience patterns than browser-based applications
  • Compliance requirements may mandate specific geographic data residency and security controls
  • Team expertise with specific CDN providers affects operational complexity and incident response
  • Budget constraints often favor shared infrastructure over fully redundant deployments
  • Customer tolerance for brief outages vs. performance degradation varies by industry

Need dual-CDN failover patterns?

We can configure Cloudflare primary with CloudFront secondary and rehearse AZ failover so your customers stay online during edge incidents.

Design your failover plan