content_copy

Replication
Strategies

Keeping a copy of the same data on multiple machines. It sounds simple until you ask: "Who accepts the writes?"

Why Replicate?

verified_user

High Availability

If one node goes down (and it will), the system keeps running using the replica.

speed

Read Scaling

Send writes to one node, but read from 10 replicas. Increases read throughput linearly.

public

Latency

Keep a copy of data geographically close to the user (e.g., US, EU, Asia).

Single Leader (Active-Passive)

The standard for most databases (Postgres, MySQL).

  • 1. Leader:Accepts ALL writes. Replicates them to followers.
  • 2. Followers:Read-only. They apply the leader's replication log.
The Risk:

If the Leader dies, you must promote a Follower. If replication wasn't finished, data is lost.

LEADERWrite + Read
arrow_downwardarrow_downward
FOLLOWERRead Only
FOLLOWERRead Only

Multi-Leader (Active-Active)

What if you have a datacenter in US and one in EU? Writing from EU to US Leader is too slow.

Solution: One Leader per datacenter. They sync with each other asynchronously.

warningConflict Hell

User A updates title to "Foo" in US.
User B updates title to "Bar" in EU.

Who wins? Last Write Wins (LWW)? Merge?
Multi-leader is complex. Avoid unless necessary.

Leaderless (Dynamo-style)

groups

No Leader, No Bottleneck

Used by Cassandra, DynamoDB, Riak.

The client sends the write to any node. That node forwards it to N replicas in parallel.

Read Repair

When reading, client asks multiple nodes. If one returns old data, the client fixes it with the new data from others.

Anti-Entropy

Background process (Merkle Trees) constantly compares replicas and syncs missing data.

Sync vs Async Replication

Synchronous

"Wait for followers"

  • Zero data loss.
  • Write Latency = Max(Follower Latency).
  • If one follower dies, write fails (or stalls).

Asynchronous

"Fire and forget"

  • Fast writes. Leader confirms immediately.
  • Resilient to slow followers.
  • Data loss if Leader crashes before sync.