Living with failure

fallacies of distributed computing

  • original ones from deutsch,
    • emphasis in unreliable networks
    • there's some concrete examples and data in data-intensive book, eg. about failures in cloud environments
    • see also Bailis/Kingsbury article

other problems

  • pitfall: a remote call is just like a funcion call.
    • see "A Note on Distributed Computing", Waldo
  • unreliable clocks
  • nodes die / expect failures
  • (maybe) end to end system design

Conclusions

for theory, connect with models chapter for a more formal treatment, required for consensus algorithms for practice: let it crash philosophy from erlang and related papers

References

TODO: review these references, incorporate what's useful, remove the rest