speed

Latency vs
Throughput

The two numbers that dictate the performance of every system. Understanding the difference is the first step in architectural thinking.

The Core Definitions

timer

Latency

The time it takes for a single unit of data to travel from point A to point B.

Units: milliseconds (ms), seconds (s)

conveyor_belt

Throughput

The amount of data that can be processed or transmitted in a given amount of time.

Units: bits/sec (bps), req/sec (RPS)

The Mental Model

local_shippingThe Highway Analogy

Imagine a highway connecting two cities.

Latency:How fast a single car travels from City A to City B. (e.g., 1 hour)
Throughput:How many cars arrive at City B per hour. (e.g., 1000 cars/hour)

"Widening the highway (more lanes) increases throughput, but does NOT reduce latency (travel time)."

Wider Pipe = More Throughput
Shorter Pipe = Less Latency

Latency Numbers to Know

Jeff Dean's famous numbers. Memorize the orders of magnitude.

L1 Cache Reference

0.5 ns

Branch Mispredict

5 ns

L2 Cache Reference

7 ns

Mutex Lock/Unlock

25 ns

Main Memory Ref

100 ns

Compress 1K Bytes

3,000 ns (3 µs)

Send 2K Bytes over 1Gbps Network

20,000 ns (20 µs)

SSD Random Read

150,000 ns (150 µs)

Read 1 MB sequentially from Memory

250,000 ns (250 µs)

Round Trip within Datacenter

500,000 ns (0.5 ms)

Disk Seek

10,000,000 ns (10 ms)

Send packet CA->Netherlands->CA

150,000,000 ns (150 ms)

The Fundamental Trade-off

Optimizing for Latency

• Moving computation closer to user (Edge)
• Caching hot data in memory
• Removing processing steps
• Using faster hardware

Goal: Response Time

Optimizing for Throughput

• Batch processing requests
• Pipelining instructions
• Parallel processing (MapReduce)
• Asynchronous queues

Goal: Scale / Volume

Crucial Insight: Batching improves throughput but worsens latency for individual items.

How to Optimize

Parallelism

Doing multiple things at once. Increases throughput.
Example: Adding more servers behind a load balancer.

Concurrency

Managing multiple tasks at once (time-slicing). Improves throughput on single core.
Example: Node.js Event Loop handling multiple requests.

Caching

Reducing the distance data travels. Improves latency drastically.
Example: Redis checking before hitting Postgres.

Interview Guidance

Common Mistake

Confusing scaling (throughput) with optimization (latency). If an API is slow, adding more servers won't make it faster for a single user—it just allows more users to be slow at the same time.

The "Aha!" Moment

"If you can't reduce the distance (latency), increase the width of the pipe (throughput). If you can't increase the width, reduce the number of trips (caching)."

Interview Questions to Prep

Q: How does batching affect latency and throughput?

A: Improves throughput by reducing overhead per item, but increases latency for individual items (they must wait for the batch to fill).

Q: You have a global user base. How do you reduce latency?

A: CDN / Edge computing to move data closer to the user (reducing physical travel time).

Q: Why is a 99th percentile (p99) latency more important than the average?

A: Averages hide 'long tail' outliers. If 1% of your users wait 10 seconds while others wait 10ms, the average looks fine, but you're losing 1% of your users.

arrow_backPreviousInterview Framework Nextarrow_forwardReliability & Availability

Latency vs Throughput