Latency vs
Throughput
The two numbers that dictate the performance of every system. Understanding the difference is the first step in architectural thinking.
The Core Definitions
Latency
The time it takes for a single unit of data to travel from point A to point B.
Throughput
The amount of data that can be processed or transmitted in a given amount of time.
The Mental Model
local_shippingThe Highway Analogy
Imagine a highway connecting two cities.
- Latency:How fast a single car travels from City A to City B. (e.g., 1 hour)
- Throughput:How many cars arrive at City B per hour. (e.g., 1000 cars/hour)
Shorter Pipe = Less Latency
Latency Numbers to Know
Jeff Dean's famous numbers. Memorize the orders of magnitude.
The Fundamental Trade-off
Optimizing for Latency
- • Moving computation closer to user (Edge)
- • Caching hot data in memory
- • Removing processing steps
- • Using faster hardware
Optimizing for Throughput
- • Batch processing requests
- • Pipelining instructions
- • Parallel processing (MapReduce)
- • Asynchronous queues
Crucial Insight: Batching improves throughput but worsens latency for individual items.
How to Optimize
Parallelism
Doing multiple things at once. Increases throughput.
Example: Adding more servers behind a load balancer.
Concurrency
Managing multiple tasks at once (time-slicing). Improves throughput on single core.
Example: Node.js Event Loop handling multiple requests.
Caching
Reducing the distance data travels. Improves latency drastically.
Example: Redis checking before hitting Postgres.
Interview Guidance
Common Mistake
Confusing scaling (throughput) with optimization (latency). If an API is slow, adding more servers won't make it faster for a single user—it just allows more users to be slow at the same time.
The "Aha!" Moment
"If you can't reduce the distance (latency), increase the width of the pipe (throughput). If you can't increase the width, reduce the number of trips (caching)."
Interview Questions to Prep
Q: How does batching affect latency and throughput?
A: Improves throughput by reducing overhead per item, but increases latency for individual items (they must wait for the batch to fill).
Q: You have a global user base. How do you reduce latency?
A: CDN / Edge computing to move data closer to the user (reducing physical travel time).
Q: Why is a 99th percentile (p99) latency more important than the average?
A: Averages hide 'long tail' outliers. If 1% of your users wait 10 seconds while others wait 10ms, the average looks fine, but you're losing 1% of your users.