cyclone

Consistent
Hashing

The algorithm that keeps distributed systems from collapsing when you add a single new server. Scalability without mass migration.

The Problem: Mod N Hashing

Normally, we distribute data using `server_index = hash(key) % N`, where `N` is the number of servers.

// If N = 4, hash("user_1") % 4 = 1 -> Server 1
// If N = 5, hash("user_1") % 5 = 4 -> Server 4 (CACHED DATA IS GONE!)

When `N` changes by 1, nearly all keys move to a different server. In a cache, this is a cache miss storm. In a database, this is an expensive migration.

The Solution: Hash Ring

We treat the hash space as a circle (e.g., [0, 2^32 - 1]). Servers and Keys are both mapped onto this ring.

CLOCKWISE SEARCH

"To find a key's server, locate the key on the ring and move clockwise until you hit the first server node."

restart_alt

Adding & Removing Nodes

When a server is added or removed, only the adjacent keys on the ring are affected.

Efficient Scaling

On average, only 1/N of the keys move. Massively reducing the migration burden.

High Availability

If a node dies, its load is automatically shifted to the next node in the ring.

Virtual Nodes (VNodes)

Real-world nodes have different capacities. Standard hashing might cluster servers unevenly, creating hot spots.

The VNode Trick

Instead of placing Server A once, we place it 100 times (A_1, A_2, ... A_100) using different hashes.

  • • Balances data distribution significantly.
  • • Allows heterogenous servers (GIVE more VNodes to bigger servers).

Interview Guidance

"I wouldn't use simple Mod N hashing because a change in cluster size would invalidate the entire cache. I'd use Consistent Hashing to limit data movement to 1/N."
Mention DynamoDB

The Amazon Dynamo paper popularized this. It's the standard for Cassandra, Riak, and CDNs.

Don't forget the VNodes

Always mention VNodes to show you understand the 'Hot Spot' problem and non-uniform distribution.