Design a
URL Shortener
tinyurl.com, bit.ly. It seems simple: Map a long string to a short string. But how do you generate 100 billion unique short strings without collisions?
Requirements
Functional
- 1. Given a Long URL, return a unique Short URL.
- 2. Clicking the Short URL redirects to the Long URL.
- 3. Short URLs should be as short as possible.
- 4. Optional: Custom aliases (e.g., bit.ly/my-cool-link).
Non-Functional
- 1. High Availability: Redirects must not fail.
- 2. Low Latency: Redirection must be instant.
- 3. Read Heavy: 100:1 Read/Write ratio.
API Design
Response: { "shortUrl": "https://tiny.url/AbCd12" }
301 vs 302 Redirect?
301 (Permanent) = Browser caches it. Less load on server, but no analytics.
302 (Temporary) = Always hits server. Good for analytics, higher load.
Database Schema
We need to store billions of records. No relationships needed. NoSQL (DynamoDB, Cassandra) or Key-Value (Riak) is best.
Hash Function vs ID Generation
Approach 1: Hashing
MD5(long_url) = 206b...837 (128-bit). Take first 7 chars?
If two users shorten the same URL, they get the same hash. Or worse, different URLs collide on first 7 chars.
Approach 2: Unique ID + Base62
Generate a unique integer ID (1, 2, ... 100000). Convert to Base62 ([a-z], [A-Z], [0-9]).
ID: 1000000000 → "15FTGg"
Guaranteed unique. Shortest possible string.
Generating Unique IDs
Distributed ID Generator
A single database `AUTO_INCREMENT` won't scale. We need a distributed way to get unique numbers.
Twitter's algorithm. Uses timestamp + worker ID + sequence. Fast & sortable.
Pre-generate 1M keys. Load them into memory. Hand them out. Fast but complex state.
Server 1 takes [1-1M]. Server 2 takes [1M-2M]. Simple.
Scaling Reads & Writes
Since reads are 100x writes, we prioritize read speed.
Checks Cache first. If miss, check DB & update Cache.
Reads are ~1ms.