A tour de force in distributed systems

data-intensive ch. 8 db internals ch. 8

sequential, concurrent, parallel, distributed

sequential computing

[CODE SAMPLE]

concurrent computing

[CODE SAMPLE]

side notes:

  • problems that arise in concurrent programming (deadlocks, race conditions, code complexity, difficulty to test)
  • briefly describe concurrent programming models

async programming

[CODE SAMPLE]

parallel computing

[CODE SAMPLE]

distributed computing

[CODE SAMPLE]

reasons to distribute: performance (speed, scalability of resources, cost-efficiency), reliability

When your program is a local text editor, your users don't expect it to continue running if their laptop runs out of battery or the hard drive crashes. As a programmer, you can safely ignore those scenarios. But if your program is a web application, it's not acceptable to go offline if there's a faulty hard drive in some datacenter or a power outage in in North Virginia. Web applications need to serve thousands or millions of users so it's not cost-effective to run them on a single host; you need multiple servers and with more components the probability of faulty hardware increases. What's more, as you'll see in the next section, networks are even less reliable than computers: messages get lost or duplicated, hosts become unreachable. The bottom line is that in distributed setups you need to assume things are going to break, even in unexpected ways, and design systems that continue working even in the presence of errors.