Skip to main content

System Design Cheat Sheet

The whole series, compressed. Print it, or skim it the night before.

This page is deliberately plain — no animations, no demos — so it prints cleanly and reads fast. Every row links back to the full concept if a definition has gone fuzzy.


The 9 Questions That Shape Every Design

Requirements · Scale · Data · APIs · Reliability · Performance · Consistency · Operations · Cost

Ask all nine, every time. The questions never change; the answers build wildly different systems.

The 5-Step Framework

  1. Clarify requirements — what's in, what's explicitly out
  2. Estimate scale — users × actions/day ÷ 86,400 ≈ avg/sec, then ×3–4 for peak (shortcut: per day ≈ ÷100,000 per second)
  3. Start simple — 3–4 boxes; complexity must be earned
  4. Find bottlenecks — ask "what if X dies / 10×s / goes cold?" for every box
  5. Name trade-offs — every technique's cost, said out loud

The Honest Defaults

Reach for these unless a requirement forces otherwise — and be ready to say which requirement:

DecisionDefaultMove off it only when…
ArchitectureMonolithTeam scaling or per-component scaling demands services
DatabaseSQL (Postgres)A specific access pattern or scale forces NoSQL
API styleRESTMany varied clients (GraphQL) / real-time (WebSockets) / events (Webhooks)
Scaling orderUp → out → replicate → shardNever skip ahead; sharding is the last resort
ConsistencyStrong where money/safety, eventual elsewhere

Networking — full page

ConceptThe one-linerThe trap
Client–ServerOne asks, one answers; the contract is the only fixed thingTrusting client input or server uptime
IPEvery box in your diagram has an address; arrows = machines dialing IPsHardcoded IPs in configs and tests
DNSName → IP, cached at every hop, each with a TTL"Propagation delay" = old caches expiring; stale /etc/hosts
ProxiesForward shields clients; reverse shields servers (TLS, caching, LB)App sees proxy's IP unless X-Forwarded-For is honored
LatencyPropagation + serialization + processing + queuingAverages lie — measure p99, not p50
HTTP/HTTPSStateless protocol; status codes are a contract; TLS adds a round tripAn endpoint that also answers plain HTTP

APIs — full page

StyleUse whenThe cost
RESTDefault — CRUD, cacheable, predictableOver/under-fetching on rich screens
GraphQLMany clients, each wanting different fieldsResolver complexity, no free HTTP caching, N+1 risk
WebSocketsLive two-way: chat, presence, dashboardsStateful — scaling and reconnection are on you
WebhooksServer→server "tell me when X happens"Needs retries + idempotency keys + event log

Verb contracts: GET never mutates · PUT/DELETE are idempotent · delivery is at-least-once, so processing must be exactly-once.

Data Storage — full page

ConceptThe one-linerThe trap
DB typesRelational / document / key-value / graph — pick by access patternAsking "which DB is best?"
SQL vs NoSQLACID + joins vs flexible + horizontal scale"NoSQL is faster" — it just moves the work into your app
IndexingB-tree, O(log n) reads; slower writes, more storageTiny test datasets hide missing indexes — read EXPLAIN
Vertical partitioningSplit a wide table by columns / access patternThe multi-table write is no longer atomic for free
CachingCache-aside + Redis = the biggest read winInvalidation, cold-start stampedes, thundering herds
DenormalizationDuplicate data to kill joins on read-heavy pathsCopies drift — never propose it without a sync plan
Blob storageFiles in S3, URL in the databaseOrphans on either side of the seam; "private" URLs that aren't

Scaling — full page

ConceptThe one-linerThe trap
VerticalBigger box; zero code changesHard ceiling, single point of failure
HorizontalMore boxes behind a load balancerServers must be stateless (sessions → shared store)
Load balancerRound robin / least-connections / weighted + health checksThe LB itself is a SPOF until it's redundant
ReplicationWrites → primary, reads → replicas; failover via promotionLag; async failover can lose acked writes
ShardingSplit by rows to scale writes (horizontal partitioning)Hot shards, cross-shard queries, % N reshuffling
Consistent hashingRing placement — adding a shard moves only ~1/N of keysSkipping it and migrating everything under load

Distributed Systems — full page

ConceptThe one-linerThe trap
CAPDuring a partition: consistency or availabilitySorting databases into boxes — most are tunable
PACELCNo partition? You're still trading latency vs consistencyForgetting the everyday trade-off bites more than partitions
CDNPull-through edge caches near users"Deployed but users see the old version" — cache busting
IdempotencySame request N times = same effect; keys make retries safeThe concurrent-retry race in "check then insert"
Timeouts/retriesEvery call gets a deadline; retry with backoff + jitterRetry storms — 3 retries = 4× load at the worst moment
Circuit breakerAfter enough failures, fail fast; probe half-open to recoverA breaker that never closes again

Architecture Patterns — full page

ConceptThe one-linerThe trap
MicroservicesIndependent deploy/scale/teams; each owns its DBThe bugs move into the gaps — contract tests or bust
Message queuesAsync decoupling + buffering; consumers drain at their paceAt-least-once delivery → consumers must be idempotent; poison messages
Rate limitingToken bucket / sliding window at the gateway, counter in RedisPer-instance counters = limit × number of servers
API gatewayOne front door: auth, routing, limits, aggregationOne unprotected route leaks the whole backend

Numbers Worth Knowing Cold

NumberMeaning
~100msFeels instant; ~1s feels sluggish; 3s+ users leave
86,400 ≈ 10⁵Seconds per day — per day ÷ 100,000 ≈ per second
3–4×Peak-to-average traffic multiplier for estimates
100:1Typical read:write ratio for feeds/content systems
62⁷ ≈ 3.5T7-char Base62 namespace (short codes, IDs)
p50 vs p99The median lies; the p99 writes the angry review

The QA Attack Checklist

The cross-cutting tests that find real system-level defects, in one list:

  • Two nodes, not one — statefulness, sticky sessions, and per-instance counters all hide on a single node
  • Write here, read there — replication lag and read-your-own-writes, made explicit
  • Same request twice (then twice concurrently) — idempotency and its race window
  • Kill it mid-flight — the primary under write load, a WebSocket server mid-session, an upload at 50%
  • Slow, not dead — inject latency, not errors; watch threads pile up upstream
  • Cold everything — empty cache under load, cold CDN region, queue replay after downtime
  • The N+1th request — rate limit boundaries: clean 429 with Retry-After, never a 500
  • The seams — DB row ↔ blob file, service ↔ service contracts, cache ↔ source of truth

Keep Going