search_insights

Advanced
Indexing

The difference between a 1ms query and a timeout. From standard B-Trees to high-dimensional spatial search.

Why Indices?

"An index is a trade-off: You pay in disk space and write latency to buy read speed."

• Without index: O(N) Scan

• With index: O(log N) or O(1)

1. B-Tree Internals

The Multi-Way Tree

Standard in Postgres/MySQL. Optimized for block storage. Unlike an AVL tree, a B-Tree has high Fan-out (hundreds of children per node) to minimize disk seeks.

Fan-outShallow Tree
Page SizeAligned to OS
Range ScansLinked Leaves

2. Inverted Index (Search)

The heart of Elasticsearch and Lucene. Instead of mapping Doc → Words, we map Word → [Doc IDs].

# Term Dictionary

"distributed" → [1, 4, 12, 99]

"consensus" → [4, 88]

// Intersection for query "distributed consensus" = [4]

3. Spatial Index (Geohash)

explore

How Uber finds drivers fast.

Standard indices don't work for 2D range queries (lat/lng). We use Geohashing to turn a 2D coordinate into a 1D string where proximity on earth = shared prefix in the index.

9q9hSan Francisco Box
9q9h...uA 10m x 10m square

"To find nearby drivers, just query for drivers with the same Geohash prefix in a standard B-Tree."

Interview Guidance

The "Uber/Yelp" Case

When asked to design a geospatial system, don't say "Store lat/lng and calculate distance." Say **"We can use Geohashing or Quadtrees for efficient range queries."**

The Cost of Write

If asked why you haven't indexed every column, explain that every index adds to the WAL and makes writes slower. Select indices based on read patterns.