speed

Rate
Limiting

Protecting your services from cascading failure and malicious spikes. The traffic cop of the distributed world.

Why Rate Limit?

1. Availability (DDOS)

Prevents a single malicious user (or a buggy service) from saturating all app server threads.

2. Cost Management

Limits hits to expensive external APIs (e.g., Stripe, OpenAI) to prevent bankrupting the company.

The Algorithms

There are four canonical algorithms for rate limiting. Choosing the right one depends on whether you care about **burst accuracy** or **memory efficiency**.

1. Token Bucket

A container (bucket) has a maximum capacity. Tokens are added at a fixed rate.

• Each request consumes 1 token.
• If 0 tokens, request is dropped.

Pros: Allows for bursts of traffic. Simple to implement.

Fixed Rate Fill

generating_tokens

2. Leaky Bucket (Fixed Window)

Fixed Rate Outflow

water_drop

Requests are processed at a constant rate, regardless of the incoming burst.

• Great for background jobs (ingestion).
• Not ideal for user APIs (drops bursts immediately).

Cons: A sudden burst of legitimate traffic can be dropped entirely.

3. Sliding Window Log

Stores a timestamp for every request. On a new request, prune old timestamps outside the window and check the count.

Most Accurate

High Memory Cost

Distributed Realities

The Race Condition Problem

In a distributed environment (multiple App Servers), you cannot store the count in local memory. You need a central store like Redis.

INCRAtomic Increment

swap_horiz

EXPIREAuto Cleanup

"Use Lua scripts in Redis to ensure the 'Read-Check-Update' cycle is atomic and immune to race conditions."

Interview Guidance

"Where should I put the limit?"

Don't just say "The Code". Discuss the API Gateway (NGINX/Envoy) for high-level protection and **Application Level** for user-specific quotas.

The "LUA" Flex

Mentioning that you'd use a Lua script for atomicity in Redis shows you've actually built high-scale systems, not just read about them.

Rate Limiting