Back to all case studies
Global SaaS

Keeping a customer application online during a Cloudflare outage

Demonstrates how Route 53 failover to CloudFront plus dual-AZ origins kept a SaaS application reachable, secure, and ingesting data when Cloudflare suffered a control-plane disruption.

Availability during outage

99.94% success

Latency change

+20 ms median

Data loss

0 (Kinesis/S3 intact)

Overview

The team operated browser and mobile traffic through Cloudflare while API ingestion landed in AWS us-east-1 using ALB, ECS, and API Gateway/Lambda.

Route 53 was configured with health-checked failover to an Amazon CloudFront distribution that reused the same ALB origin as an escape hatch.

Challenges

  • Cloudflare control-plane and POP issues produced widespread 522/525 errors and elevated latency.
  • Customer traffic began failing in affected regions within two minutes of the bad configuration rollout.
  • Leadership needed assurance that ingestion pipelines and TLS posture would survive a dual-CDN failover.

Approach

  • Health-checked DNS failover

    Route 53 monitored the Cloudflare hostname and automatically shifted traffic to CloudFront within about one minute of detecting the outage.

  • Capacity buffer for origin services

    ALB targets across two AZs and ECS tasks scaled up 2x to absorb CloudFront cache misses while keeping /health endpoints green.

  • Ingestion continuity and security parity

    API Gateway and Lambda kept writing to Kinesis and S3 without interruption, while AWS WAF mirrored Cloudflare’s critical rules to maintain protection.

Impact delivered

  • Route 53 failover preserved availability with only a brief DNS cache blip; customers could continue browsing and transacting.
  • Median latency rose by roughly 20ms during CloudFront cache warm-up but stayed within SLO.
  • No data loss occurred; ingestion pipelines continued operating and security controls remained enforced.

Key lessons

  • Validate Route 53 failover paths regularly so TTLs and health checks behave during real incidents.
  • Mirror essential WAF and rate-limiting rules between Cloudflare and CloudFront to preserve security posture.
  • Pre-warm critical CloudFront caches and rehearse CDN/AZ failure game days to reduce surprise during outages.

Ready to transform your data infrastructure?

Let's discuss how we can help you achieve similar results with a tailored approach for your organization.

Get in touch