Preface

A tour de force in distributed systems

data-intensive ch. 8 db internals ch. 8

sequential, concurrent, parallel, distributed

sequential computing

[CODE SAMPLE]

concurrent computing

[CODE SAMPLE]

side notes:

  • problems that arise in concurrent programming (deadlocks, race conditions, code complexity, difficulty to test)
  • briefly describe concurrent programming models

async programming

[CODE SAMPLE]

parallel computing

[CODE SAMPLE]

distributed computing

[CODE SAMPLE]

reasons to distribute: performance (speed, scalability of resources, cost-efficiency), reliability

When your program is a local text editor, your users don't expect it to continue running if their laptop runs out of battery or the hard drive crashes. As a programmer, you can safely ignore those scenarios. But if your program is a web application, it's not acceptable to go offline if there's a faulty hard drive in some datacenter or a power outage in in North Virginia. Web applications need to serve thousands or millions of users so it's not cost-effective to run them on a single host; you need multiple servers and with more components the probability of faulty hardware increases. What's more, as you'll see in the next section, networks are even less reliable than computers: messages get lost or duplicated, hosts become unreachable. The bottom line is that in distributed setups you need to assume things are going to break, even in unexpected ways, and design systems that continue working even in the presence of errors.

Living with failure

fallacies of distributed computing

  • original ones from deutsch,
    • emphasis in unreliable networks
    • there's some concrete examples and data in data-intensive book, eg. about failures in cloud environments
    • see also Bailis/Kingsbury article

other problems

  • pitfall: a remote call is just like a funcion call.
    • see "A Note on Distributed Computing", Waldo
  • unreliable clocks
  • nodes die / expect failures
  • (maybe) end to end system design

Conclusions

for theory, connect with models chapter for a more formal treatment, required for consensus algorithms for practice: let it crash philosophy from erlang and related papers

References

TODO: review these references, incorporate what's useful, remove the rest

Replication

http://book.mixu.net/distsys/replication.html db internals ch 11, 13 data-intensive ch 9

touch briefly, just the necessary to move to cons problem in next chapter show how to go from single node, to single leader, to fault tolerance and the options and trade-off of different methods TODO: see how to best distribute between replication and consistency chapters

Replication

this is covered in data-intesive apps book

  • definition
  • can serve two purposes: performance and reliability (relate to failures as described in first chapter)
  • touch briefly on types of replication, and give examples of implementations of each: leader/followers, multi-leader, leaderless (e.g dynamo)
  • having redundancy introduces the problem of keeping replicas consistent with each other

primary/backup replication

simplest form of replication explained in distsys fun and profit https://decentralizedthoughts.github.io/2019-11-01-primary-backup/ can be used as example of weak consistency to intrduce the next section?

[CODE SAMPLES HERE]

Conclusions

References

Consistency

this is covered in database internals book

  • definition
  • consistency models: linearizability, sequential, causal, eventual (mention strong eventual/crdts here)
  • add a diagram with stronger->weaker and a tree of models/submodels and systems that implement it

cap (and its problems)

summarize conjecture explain problems of the theorem give precise meaning to terms: partition compared with failures from ch 1; consistency compared with models from previous section. (maybe) minor side note about relation between FLP and CAP explain what's its actual value in tradeoff analysis for system design

distributed transactions/2-phase commit

TODO: see basic consensus implementations from next chapters, may not be the best idea to introduce this here

https://www.the-paper-trail.org/post/2008-11-27-consensus-protocols-two-phase-commit/

[CODE SAMPLES HERE]

2PC can be thought of a solution to consensus, but it's not an interesting one because it's not fault tolerant (cannot tolerate crashes of the coordinator) (it can also be used to solve a different problem than consensus, i.e. saving different things in different nodes vs replicating the same value in consensus, see eg spanner that uses both 2pc and paxos.) for fault tolerance we need more sophisticated consensus protocols, introduced next chapter

Conclusions

References

Modeling distributed systems

failure models

pick one of: crash-stop / crash-recovery / byzantine (data-intensive) crash / omission / byzantine (db internals) https://alvaro-videla.com/2013/12/failure-modes-in-distributed-systems.html http://cs.boisestate.edu/~amit/teaching/555/handouts/fault-tolerance-handout.pdf

note: omissions includes network paritions as used in CAP

synchrony models

synchronous, asynchronous, partially synchronous https://decentralizedthoughts.github.io/2019-06-01-2019-5-31-models/

show practical examples of each synchrony model

FLP impossibility

how it can be worked around in practice randomization, failure detectors (see chandra paper), partial synchrony provide examples and trade-offs

adversary

https://decentralizedthoughts.github.io/2019-06-07-modeling-the-adversary/ https://decentralizedthoughts.github.io/2019-06-17-the-threshold-adversary/

Mapping to the real world

data-intensive apps has a good section about this

models help not only to understand the terms in which algorithms are stated and where they are applicable, but also to reason about and make trade-off analysis in real systems

Replicated state machines

most algorithms are described in terms of replicated state-machines, explain them here

https://decentralizedthoughts.github.io/2019-10-15-consensus-for-state-machine-replication/ schneider paper (maybe) lampson paper

Definitions

From primary/backup to CFT consensus

these posts show how to go from primary backup to fault tolerant consensus we may want to follow a similar route, for example introducing the lock-commit as an intermediate step before the more sophisticated algorithms (raft, pbft, etc) https://decentralizedthoughts.github.io/2020-09-13-synchronous-consensus-omission-faults/ https://decentralizedthoughts.github.io/2020-11-29-the-lock-commit-paradigm/ https://decentralizedthoughts.github.io/2020-11-30-the-lock-commit-paradigm-multi-shot-and-mixed-faults/

this maybe a bit redundant with two phase commit, though. perhaps we can do this instead of 2pc earlier also they are more difficult than streamlet, introduced later perhaps it's best to just rewrite the previous primary/backup implementation as a state machine instead of introducing the more complex algorithms

[CODE SAMPLES HERE]

Conclusions

References

Introduction to Consensus

data-intensive ch. 8, 9 db internals ch. 8 https://decentralizedthoughts.github.io/2019-06-27-defining-consensus/

  • definition
  • relation to consistency: consensus can be a way to implement linearizability in a system
  • different types of consensus based on their fault tolerance

Atomic broadcast

  • definition
  • it's the same problem as consensus but looked from a different angle

this relation is well covered in db internals book and in the tendermint thesis

Safety and liveness properties

here or somewhere else, explain how it maps to other classification of properties, eg.: agreement, integrity, validity, termination termination = fault-tolerance

streamlet

https://decentralizedthoughts.github.io/2020-05-14-streamlet/ https://eprint.iacr.org/2020/088 http://elaineshi.com/docs/blockchain-book.pdf chapter 7

this is a "textbook" protocol, good for pedagogical value so it's an obvious candidate to introduce some topics NOTE: this is a blockchain protocol and it's permissioned, so it kind of jumps ahead in the topics as currently organized in the book. nevertheless it seems like a better approach to start with this as the first consensus example before moving into classical stuff like paxos or raft.

[CODE SAMPLES HERE]

Conclusions

References

Crash fault-tolerant (CFT) consensus

explain raft in detail, the rest briefly

historical notes

  • paxos
  • (maybe) Viewstamped Replication
  • zab
  • chubby. paxos made live https://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/papers/paper2-1.pdf (chubby. covers issues when going to production)

Raft

http://thesecretlivesofdata.com/raft/ -> would be good to have more like this for other algorithms https://github.com/ongardie/raft.tla/blob/master/raft.tla raft paper

  • leader election
  • log replication

[CODE SAMPLES]

Conclusions

References

Byzantine fault-tolerant (BFT) consensus

sources: real-world crypto ch. 12 https://pontem.network/posts/aptosbft-all-you-need-to-know-about-the-bft-consensus-in-aptos

The Byzantine generals problem

Practical Byzantine fault-tolerance PBFT

Conclusions

References

Bitcoin and permissionless consensus

sources: real-world crypto ch. 12 https://pontem.network/posts/aptosbft-all-you-need-to-know-about-the-bft-consensus-in-aptos

before bitcoin BFT was a theoretical problem and the solutions impractical, but a permissionless network required to assume arbitrary/malicious nodes.

Proof of authority, proof of work, proof of stake

Conclusions

References

Proof of Stake consensus

real-world crypto ch. 12

Tendermint

[CODE SAMPLE]

Diem

previously Libra, based on hotstuff (need to make differences with tendermint explicit, and if they are not big enough probably not worth getting into the details here, perhaps just mentioning it)

Algorand?

algorand paper Byzantine Agreement, Made Trivial

Conclusions

References

Directed acyclic graph (DAG) consensus

https://decentralizedthoughts.github.io/2022-06-28-DAG-meets-BFT/ https://www.paradigm.xyz/2022/07/experiment-narwhal-bullshark-cosmos-stack Bullshark: DAG BFT Protocols Made Practical

Appendix: Summary of algorithms

make a table with each algorithm: synchrony and failure models it assumes, adversary threshold, performance, code complexity

References/Notes/TODO