Postmortems/Incident Reports/War Stories from real-world failures

Algolia #

Atlassian #

Authzed #

AWS #

Bungie #

Cloudflare #

Celer Bridge #

DataDog #

DataSpring #

  • Datacenter and tornado
    • Website is in Czech, use a translation service to read it. archive.today link
    • This is a story of how a data center dealt with a tornado, a good reminder to verify your offsite backups, a disaster recovery plan, and conduct disaster recovery dry runs.

Deno #

DoorDash #

Facebook #

Fastly #

Garmin #

GitHub #

GitLab #

Google Cloud #

Grafana Labs #

Independent Stories #

Indian Registry for Internet Names and Numbers (IRINN) #

KLAYswap #

Loom.com #

Netflix #

Nomad Bridge #

n8n #

Oracle Cloud #

  • Datacenter cooling infrastructure failed in UK South - July 19, 2022
    • Following unseasonably high temperatures in the UK South (London) region, two cooler units in the data center experienced a failure when they were required to operate above their design limits. As a result, temperatures in the data center began to climb causing a subset of Compute infrastructure to go into protective shut down.
    • Google Cloud also had cooling related failure in London (europe-west2)

Roblox #

Rogers Communications #

Salesforce #

Stripe #

Slack #

Twitter #

Verizon #