Capstone: Design a URL Shortener
Every concept in this series, used in anger on one real problem
"Design a URL shortener." It sounds almost insultingly simple — turn a long URL into a short one, redirect when clicked. That's exactly why interviewers love it: there's nowhere to hide. Every decision you make reveals whether you actually own the concepts or just recognize them.
This page walks the full 5-step framework from a blank page, the way you would in a 45-minute interview or a real design review. Watch for the concepts from the series doing their jobs — they're linked as they appear.
Read each step's question first and try to answer it yourself before reading on. The gap between your answer and the walkthrough is exactly what to study next.
Step 1: Clarify the Requirements
Never start drawing. Start asking. Here's the conversation:
| Question you ask | Answer you get | What it decides |
|---|---|---|
| Custom short codes, or generated? | Generated; custom is a nice-to-have | Code-generation strategy |
| Do links expire? | Optional TTL, default forever | Storage growth, cleanup job |
| Click analytics? | Yes — counts, referrers | An async pipeline (it must never slow redirects) |
| Scale? | ~100M new links/month, read-heavy | The entire architecture |
| Latency target? | Redirect must feel instant (<100ms) | Cache strategy, maybe CDN |
Functional requirements: shorten a URL, redirect a short code, optional expiry, click analytics. Non-functional: redirects are latency-critical and must be highly available; creation can be slower. Out of scope (say it out loud): user accounts, link editing, spam scanning.
The most important sentence you can say in the first five minutes: "This is a read-heavy system — a link is created once and clicked thousands of times — so I'll optimize everything around the redirect path."
Step 2: Estimate the Scale
Back-of-envelope, as derived in the intro — powers of ten, not precision:
Writes (new links):
- 100M links/month ÷ ~2.6M seconds/month ≈ ~40 writes/sec average → ~120/sec at peak
- Trivial. Any database survives this.
Reads (redirects), assuming a 100:1 read ratio:
- 10B redirects/month ≈ ~4K reads/sec average → ~12K/sec at peak
- This is the number that designs the system.
Storage:
- ~500 bytes per record (short code, long URL, metadata) × 100M/month ≈ 50 GB/month, ~3 TB over 5 years
- Comfortable on one primary with replicas. We don't need to shard — and saying why not scores more points than sharding reflexively.
The design pressure: 12K reads/sec at <100ms, against a trickle of writes. That spells cache-heavy read path, replication for availability, and no exotic storage.
Step 3: Start Simple
Four boxes first. Complexity must be earned.
Two endpoints — a REST contract you can state in one breath:
| Endpoint | Does | Returns |
|---|---|---|
POST /links { "url": "https://very/long/path" } | Creates a short code | 201 + { "short": "qa.st/x7Kp9aQ" } |
GET /x7Kp9aQ | Looks up the code | 301/302 redirect to the long URL |
Now the two design decisions that make or break this system.
Decision 1: How do you generate the short code?
A 7-character Base62 code (a–z, A–Z, 0–9) gives 62⁷ ≈ 3.5 trillion combinations — at 100M links/month, that's ~3,000 years of namespace. The question is how to pick each code:
- No coordination — any server can generate
- Same URL hashes to the same code (free dedup)
- Truncating a hash invites collisions
- Must check-and-retry on every collision
- Zero collisions by construction
- Codes are short and dense from day one
- The counter is a single point of coordination
- Sequential codes are guessable (enumerate all links!)
The senior move is the hybrid: each app server leases a range of the counter (server A gets 1M–2M, server B gets 2M–3M) from a coordination store, then encodes locally — no per-request coordination, no collisions. Sprinkle in a random offset so codes aren't enumerable.
QA Lens
Collision handling is the test nobody writes. Force two different URLs onto the same code
(stub the hash) and assert the second gets a different code — not a silent overwrite
that redirects thousands of users to the wrong site. Then test enumerability: if
/x7Kp9aQ exists, does /x7Kp9aR leak someone else's private link?
Sequential codes turn your database into a public directory.
Decision 2: 301 or 302 redirect?
This tiny detail is a complete trade-off story:
301 Moved Permanently | 302 Found (temporary) | |
|---|---|---|
| Browser behavior | Caches the redirect — next click skips your server entirely | Asks your server every time |
| Server load | Lower (browsers absorb repeat clicks) | Full traffic, every click |
| Analytics | Lost after the first click per browser | Every click counted |
| Change/expire the link later | Can't — browsers remember the old target | Works |
Analytics was a requirement, so: 302 — and now you know why every commercial shortener
chooses it, accepting higher traffic as the price of seeing the clicks.
Step 4: Find the Bottlenecks
Attack the naive design with the peak numbers. 12K reads/sec against one database is rude — and unnecessary, because this workload is a cache's dream: tiny values, massively repeated keys, and the data never changes (a code always maps to the same URL), so invalidation — the genuinely hard part of caching — barely exists here.
Walk the redirect: load balancer → stateless app server → Redis hit (~1ms) for the hot 20% of links serving ~95% of traffic → on miss, fall back to a read replica and write back to the cache. Meanwhile the click event goes onto a queue — the user's redirect never waits for analytics.
Now interrogate it, component by component:
QA Lens
The redirect path also needs security tests: this service is an open-redirect
machine by design, which makes it a phishing favorite. Verify that creating a link to a
known-bad domain is rejected, that expired links return 410 Gone (not a redirect to
nowhere), and that the analytics queue being completely down doesn't add a single
millisecond to a redirect — async means async, and only a fault-injection test proves it.
Step 5: Say the Trade-offs Out Loud
Every choice above has a cost. Naming them unprompted is what separates "drew the boxes" from "owns the design":
| Choice | What we gained | What we paid |
|---|---|---|
| Cache-aside Redis | ~1ms reads for hot links | Cold-start stampede risk; one more component to operate |
| Counter ranges + Base62 | No collisions, no coordination per request | Range-lease bookkeeping; must defend against enumeration |
302 over 301 | Full click analytics, links stay editable | Every click hits our servers — we eat the traffic |
| Replicas, no shard | Operational simplicity; reads scale | A write ceiling we won't reach for years — revisit if write volume 100×s |
"It's read-heavy, so the redirect path is cache-first with replicas behind it, analytics go async over a queue so they can never slow a redirect, and I chose 302 over 301 to keep the click data — at the cost of serving every click. I'm deliberately not sharding: 3 TB over five years doesn't justify it." Four sentences, and every one is a justified trade-off.
Test Yourself
The follow-ups an interviewer would actually ask:
Q1. Marketing wants links that can be edited after creation to point somewhere new. What breaks, and what must change?
Two things: 301 would have been fatal (browsers cached the old destination forever — good
thing we chose 302), and the cache now has a real invalidation problem for the first time
— on edit, you must update the DB and delete the Redis key, or users get the stale target for
the TTL. The "data never changes" assumption was doing a lot of quiet work.
Q2. Write volume grows 100× — 4K creates/sec. Walk the write path: what gives out first?
The single primary. Options in order of escalation: fatter primary (vertical), then split the namespace across multiple primaries by code prefix — which is sharding, with the shard key chosen by hash so sequential counters don't create a hot shard. The counter-range scheme already distributes; it scales with you.
Q3. Users in Australia complain redirects take 400ms. Your servers are in Virginia. What's the fix, and what's the catch?
That's propagation latency — physics, not code. Move the lookup closer: regional deployments or a CDN/edge layer that can answer redirects from a replicated code→URL map. The catch: now the mapping is replicated globally, so an edited or expired link takes time to propagate — you've traded consistency lag for latency, which is PACELC in the wild.
Q4. Why is "the analytics pipeline lost 0.1% of click events" acceptable here, when "the payment queue lost 0.1% of charges" would be a disaster?
Requirements decide guarantees, not technology. Click counts are aggregate trends — approximate is fine, so at-least-once delivery with occasional loss under failure is a sane cost/benefit. Payments need every event exactly-once-processed with idempotency keys. Same queue technology, opposite contracts — which is why you ask about the requirement before designing the pipeline.
Quick Revision
Read-heavy (100:1), latency-critical reads, trickle of writes, data that never changes. A cache's dream.
Base62, 7 chars ≈ 3.5T codes. Counter ranges per server: no collisions, no bottleneck, not enumerable (add an offset).
LB → stateless servers → Redis → replica on miss. 302 keeps analytics; clicks go async via queue.
No shard, no microservices, no Kafka cluster — the numbers don't demand them. Complexity must be earned.
Where to Next
- The Cheat Sheet — every concept from the series on one printable page
- What is System Design? — the framework this page just exercised
- Then do it yourself: design a pastebin, an image host, or a rate limiter with the same five steps — they're all neighbors of this problem