Advanced
Indexing
The difference between a 1ms query and a timeout. From standard B-Trees to high-dimensional spatial search.
Why Indices?
"An index is a trade-off: You pay in disk space and write latency to buy read speed."
• Without index: O(N) Scan
• With index: O(log N) or O(1)
1. B-Tree Internals
The Multi-Way Tree
Standard in Postgres/MySQL. Optimized for block storage. Unlike an AVL tree, a B-Tree has high Fan-out (hundreds of children per node) to minimize disk seeks.
2. Inverted Index (Search)
The heart of Elasticsearch and Lucene. Instead of mapping Doc → Words, we map Word → [Doc IDs].
# Term Dictionary
"distributed" → [1, 4, 12, 99]
"consensus" → [4, 88]
// Intersection for query "distributed consensus" = [4]
3. Spatial Index (Geohash)
How Uber finds drivers fast.
Standard indices don't work for 2D range queries (lat/lng). We use Geohashing to turn a 2D coordinate into a 1D string where proximity on earth = shared prefix in the index.
"To find nearby drivers, just query for drivers with the same Geohash prefix in a standard B-Tree."
Interview Guidance
The "Uber/Yelp" Case
When asked to design a geospatial system, don't say "Store lat/lng and calculate distance." Say **"We can use Geohashing or Quadtrees for efficient range queries."**
The Cost of Write
If asked why you haven't indexed every column, explain that every index adds to the WAL and makes writes slower. Select indices based on read patterns.