Consensus
Algorithms
How do distributed nodes agree on a single value (like "Who is the leader?") when messages can be lost and nodes can lie?
The Problem: Reaching Agreement
Two Generals Problem
Two armies (nodes) need to attack a city at the exact same time. They can only communicate by sending a messenger through enemy territory (unreliable network).
Consensus algorithms like Paxos and Raft solve a relaxed version of this: agreeing on a value assuming the network eventually delivers messages.
Paxos (The Theorist)
The original consensus algorithm by Leslie Lamport. Proven correct, but famously difficult to understand and implement.
Key Concept: Proposals
Nodes propose values with a sequence number. Higher numbers win. Requires a majority (Quorum) to accept a proposal.
Used by: Google Chubby, Apache Zookeeper (ZAB is similar).
"There is only one consensus protocol, and that is Paxos."
— Mike Burrows (Google)
Raft (The Pragmatist)
Understandable Consensus
Designed explicitly to be easier to understand than Paxos. It decomposes the problem into: Leader Election, Log Replication, and Safety.
Accepts all writes. Handles replication.
Passive. Responds to Leader. Becomes Candidate if Leader dies.
Active only during election. Asks for votes.
Leader Election
How does Raft pick a leader?
Step 1: Timeout
Followers expect a heartbeat from Leader every 100ms. If silence for 300ms (randomized), a Follower assumes Leader is dead.
Step 2: Request Vote
Follower becomes Candidate. Increments term counter. Sends "Vote for Me!" to all.
Step 3: Majority Wins
If a Candidate gets votes from a majority (N/2 + 1), it becomes Leader. It starts sending heartbeats to suppress other candidates.
Gossip Protocols
Epidemic Algorithms
Unlike Raft (Strong Consistency), Gossip is for Eventual Consistency. Nodes pick random peers and exchange information. Like a virus spreading.
How it works
- Every second, Node A picks random Node B.
- They exchange state ("I know X=5", "Oh, I thought X=4").
- Both update to the latest version.
- Information propagates exponentially.
Use Cases
• Cassandra (Cluster membership)
• DynamoDB (Failure detection)
• Bitcoin (Peer discovery)