Living with failure
fallacies of distributed computing
- original ones from deutsch,
- emphasis in unreliable networks
- there's some concrete examples and data in data-intensive book, eg. about failures in cloud environments
- see also Bailis/Kingsbury article
other problems
- pitfall: a remote call is just like a funcion call.
- see "A Note on Distributed Computing", Waldo
- unreliable clocks
- nodes die / expect failures
- (maybe) end to end system design
Conclusions
for theory, connect with models chapter for a more formal treatment, required for consensus algorithms for practice: let it crash philosophy from erlang and related papers
References
TODO: review these references, incorporate what's useful, remove the rest
- The Trouble with Distributed Systems - Designing Data-Intensive Applications chapter 8
- Distributed Systems Intoroduction and Overview - Database Internals chapter 8
- Fallacies of distributed computing
- Distribunomicon - Learn you some Erlang for great good
- The network is reliable
- Making reliable distributed systems in the presence of software errors
- Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies
- Crash-Only Software
- A Note on Distributed Computing