Rate
Limiting
Protecting your services from cascading failure and malicious spikes. The traffic cop of the distributed world.
Why Rate Limit?
1. Availability (DDOS)
Prevents a single malicious user (or a buggy service) from saturating all app server threads.
2. Cost Management
Limits hits to expensive external APIs (e.g., Stripe, OpenAI) to prevent bankrupting the company.
The Algorithms
There are four canonical algorithms for rate limiting. Choosing the right one depends on whether you care about **burst accuracy** or **memory efficiency**.
1. Token Bucket
A container (bucket) has a maximum capacity. Tokens are added at a fixed rate.
- • Each request consumes 1 token.
- • If 0 tokens, request is dropped.
2. Leaky Bucket (Fixed Window)
Requests are processed at a constant rate, regardless of the incoming burst.
- • Great for background jobs (ingestion).
- • Not ideal for user APIs (drops bursts immediately).
3. Sliding Window Log
Stores a timestamp for every request. On a new request, prune old timestamps outside the window and check the count.
Distributed Realities
The Race Condition Problem
In a distributed environment (multiple App Servers), you cannot store the count in local memory. You need a central store like Redis.
Interview Guidance
"Where should I put the limit?"
Don't just say "The Code". Discuss the API Gateway (NGINX/Envoy) for high-level protection and **Application Level** for user-specific quotas.
The "LUA" Flex
Mentioning that you'd use a Lua script for atomicity in Redis shows you've actually built high-scale systems, not just read about them.