System Design Cheat Sheet

The whole series, compressed. Print it, or skim it the night before.

This page is deliberately plain — no animations, no demos — so it prints cleanly and reads fast. Every row links back to the full concept if a definition has gone fuzzy.

The 9 Questions That Shape Every Design

Requirements · Scale · Data · APIs · Reliability · Performance · Consistency · Operations · Cost

Ask all nine, every time. The questions never change; the answers build wildly different systems.

The 5-Step Framework

Clarify requirements — what's in, what's explicitly out
Estimate scale — users × actions/day ÷ 86,400 ≈ avg/sec, then ×3–4 for peak (shortcut: per day ≈ ÷100,000 per second)
Start simple — 3–4 boxes; complexity must be earned
Find bottlenecks — ask "what if X dies / 10×s / goes cold?" for every box
Name trade-offs — every technique's cost, said out loud

The Honest Defaults

Reach for these unless a requirement forces otherwise — and be ready to say which requirement:

Decision	Default	Move off it only when…
Architecture	Monolith	Team scaling or per-component scaling demands services
Database	SQL (Postgres)	A specific access pattern or scale forces NoSQL
API style	REST	Many varied clients (GraphQL) / real-time (WebSockets) / events (Webhooks)
Scaling order	Up → out → replicate → shard	Never skip ahead; sharding is the last resort
Consistency	Strong where money/safety, eventual elsewhere	—

Networking — full page

Concept	The one-liner	The trap
Client–Server	One asks, one answers; the contract is the only fixed thing	Trusting client input or server uptime
IP	Every box in your diagram has an address; arrows = machines dialing IPs	Hardcoded IPs in configs and tests
DNS	Name → IP, cached at every hop, each with a TTL	"Propagation delay" = old caches expiring; stale `/etc/hosts`
Proxies	Forward shields clients; reverse shields servers (TLS, caching, LB)	App sees proxy's IP unless `X-Forwarded-For` is honored
Latency	Propagation + serialization + processing + queuing	Averages lie — measure p99, not p50
HTTP/HTTPS	Stateless protocol; status codes are a contract; TLS adds a round trip	An endpoint that also answers plain HTTP

APIs — full page

Style	Use when	The cost
REST	Default — CRUD, cacheable, predictable	Over/under-fetching on rich screens
GraphQL	Many clients, each wanting different fields	Resolver complexity, no free HTTP caching, N+1 risk
WebSockets	Live two-way: chat, presence, dashboards	Stateful — scaling and reconnection are on you
Webhooks	Server→server "tell me when X happens"	Needs retries + idempotency keys + event log

Verb contracts: GET never mutates · PUT/DELETE are idempotent · delivery is at-least-once, so processing must be exactly-once.

Data Storage — full page

Concept	The one-liner	The trap
DB types	Relational / document / key-value / graph — pick by access pattern	Asking "which DB is best?"
SQL vs NoSQL	ACID + joins vs flexible + horizontal scale	"NoSQL is faster" — it just moves the work into your app
Indexing	B-tree, O(log n) reads; slower writes, more storage	Tiny test datasets hide missing indexes — read `EXPLAIN`
Vertical partitioning	Split a wide table by columns / access pattern	The multi-table write is no longer atomic for free
Caching	Cache-aside + Redis = the biggest read win	Invalidation, cold-start stampedes, thundering herds
Denormalization	Duplicate data to kill joins on read-heavy paths	Copies drift — never propose it without a sync plan
Blob storage	Files in S3, URL in the database	Orphans on either side of the seam; "private" URLs that aren't

Scaling — full page

Concept	The one-liner	The trap
Vertical	Bigger box; zero code changes	Hard ceiling, single point of failure
Horizontal	More boxes behind a load balancer	Servers must be stateless (sessions → shared store)
Load balancer	Round robin / least-connections / weighted + health checks	The LB itself is a SPOF until it's redundant
Replication	Writes → primary, reads → replicas; failover via promotion	Lag; async failover can lose acked writes
Sharding	Split by rows to scale writes (horizontal partitioning)	Hot shards, cross-shard queries, `% N` reshuffling
Consistent hashing	Ring placement — adding a shard moves only ~1/N of keys	Skipping it and migrating everything under load

Distributed Systems — full page

Concept	The one-liner	The trap
CAP	During a partition: consistency or availability	Sorting databases into boxes — most are tunable
PACELC	No partition? You're still trading latency vs consistency	Forgetting the everyday trade-off bites more than partitions
CDN	Pull-through edge caches near users	"Deployed but users see the old version" — cache busting
Idempotency	Same request N times = same effect; keys make retries safe	The concurrent-retry race in "check then insert"
Timeouts/retries	Every call gets a deadline; retry with backoff + jitter	Retry storms — 3 retries = 4× load at the worst moment
Circuit breaker	After enough failures, fail fast; probe half-open to recover	A breaker that never closes again

Architecture Patterns — full page

Concept	The one-liner	The trap
Microservices	Independent deploy/scale/teams; each owns its DB	The bugs move into the gaps — contract tests or bust
Message queues	Async decoupling + buffering; consumers drain at their pace	At-least-once delivery → consumers must be idempotent; poison messages
Rate limiting	Token bucket / sliding window at the gateway, counter in Redis	Per-instance counters = limit × number of servers
API gateway	One front door: auth, routing, limits, aggregation	One unprotected route leaks the whole backend

Numbers Worth Knowing Cold

Number	Meaning
~100ms	Feels instant; ~1s feels sluggish; 3s+ users leave
86,400 ≈ 10⁵	Seconds per day — per day ÷ 100,000 ≈ per second
3–4×	Peak-to-average traffic multiplier for estimates
100:1	Typical read:write ratio for feeds/content systems
62⁷ ≈ 3.5T	7-char Base62 namespace (short codes, IDs)
p50 vs p99	The median lies; the p99 writes the angry review

The QA Attack Checklist

The cross-cutting tests that find real system-level defects, in one list:

Two nodes, not one — statefulness, sticky sessions, and per-instance counters all hide on a single node
Write here, read there — replication lag and read-your-own-writes, made explicit
Same request twice (then twice concurrently) — idempotency and its race window
Kill it mid-flight — the primary under write load, a WebSocket server mid-session, an upload at 50%
Slow, not dead — inject latency, not errors; watch threads pile up upstream
Cold everything — empty cache under load, cold CDN region, queue replay after downtime
The N+1th request — rate limit boundaries: clean 429 with Retry-After, never a 500
The seams — DB row ↔ blob file, service ↔ service contracts, cache ↔ source of truth

Keep Going

Capstone: Design a URL Shortener — watch every row of this page do its job in one design
What is System Design? — the mindset, if you landed here first

The 9 Questions That Shape Every Design​

The 5-Step Framework​

The Honest Defaults​

Networking — full page​

APIs — full page​

Data Storage — full page​

Scaling — full page​

Distributed Systems — full page​

Architecture Patterns — full page​

Numbers Worth Knowing Cold​

The QA Attack Checklist​

Keep Going​