Welcome to Failure Modes
Failure modes is collection of literature on how and why software systems fail
Running things in production is hard and running distributed systems extra hard.
Failure Modes is an effort to curate resources and stories from the community, to learn and get better at running large scale software in production.
See announcement blog post
Please send Pull Request to extend this collection.
It can be anything from incident postmortems, blog posts, projects, talks, tweets, research, etc.
Huge thanks to our contributors
Keep in touch
Subscribe to Failure Modes Newsletter to get blog posts, talks, notes and research on building and running production systems in your Inbox
Have suggestions or questions, reach out on twitter @electron0zero