Consistent
Hashing
The algorithm that keeps distributed systems from collapsing when you add a single new server. Scalability without mass migration.
The Problem: Mod N Hashing
Normally, we distribute data using `server_index = hash(key) % N`, where `N` is the number of servers.
// If N = 5, hash("user_1") % 5 = 4 -> Server 4 (CACHED DATA IS GONE!)
When `N` changes by 1, nearly all keys move to a different server. In a cache, this is a cache miss storm. In a database, this is an expensive migration.
The Solution: Hash Ring
We treat the hash space as a circle (e.g., [0, 2^32 - 1]). Servers and Keys are both mapped onto this ring.
"To find a key's server, locate the key on the ring and move clockwise until you hit the first server node."
Adding & Removing Nodes
When a server is added or removed, only the adjacent keys on the ring are affected.
Efficient Scaling
On average, only 1/N of the keys move. Massively reducing the migration burden.
High Availability
If a node dies, its load is automatically shifted to the next node in the ring.
Virtual Nodes (VNodes)
Real-world nodes have different capacities. Standard hashing might cluster servers unevenly, creating hot spots.
The VNode Trick
Instead of placing Server A once, we place it 100 times (A_1, A_2, ... A_100) using different hashes.
- • Balances data distribution significantly.
- • Allows heterogenous servers (GIVE more VNodes to bigger servers).
Interview Guidance
Mention DynamoDB
The Amazon Dynamo paper popularized this. It's the standard for Cassandra, Riak, and CDNs.
Don't forget the VNodes
Always mention VNodes to show you understand the 'Hot Spot' problem and non-uniform distribution.