Replication
Strategies
Keeping a copy of the same data on multiple machines. It sounds simple until you ask: "Who accepts the writes?"
Why Replicate?
High Availability
If one node goes down (and it will), the system keeps running using the replica.
Read Scaling
Send writes to one node, but read from 10 replicas. Increases read throughput linearly.
Latency
Keep a copy of data geographically close to the user (e.g., US, EU, Asia).
Single Leader (Active-Passive)
The standard for most databases (Postgres, MySQL).
- 1. Leader:Accepts ALL writes. Replicates them to followers.
- 2. Followers:Read-only. They apply the leader's replication log.
If the Leader dies, you must promote a Follower. If replication wasn't finished, data is lost.
Multi-Leader (Active-Active)
What if you have a datacenter in US and one in EU? Writing from EU to US Leader is too slow.
Solution: One Leader per datacenter. They sync with each other asynchronously.
warningConflict Hell
User A updates title to "Foo" in US.
User B updates title to "Bar" in EU.
Who wins? Last Write Wins (LWW)? Merge?
Multi-leader is complex. Avoid unless necessary.
Leaderless (Dynamo-style)
No Leader, No Bottleneck
Used by Cassandra, DynamoDB, Riak.
The client sends the write to any node. That node forwards it to N replicas in parallel.
When reading, client asks multiple nodes. If one returns old data, the client fixes it with the new data from others.
Background process (Merkle Trees) constantly compares replicas and syncs missing data.
Sync vs Async Replication
Synchronous
"Wait for followers"
- ✓ Zero data loss.
- ✗ Write Latency = Max(Follower Latency).
- ✗ If one follower dies, write fails (or stalls).
Asynchronous
"Fire and forget"
- ✓ Fast writes. Leader confirms immediately.
- ✓ Resilient to slow followers.
- ✗ Data loss if Leader crashes before sync.