Capstone: Design a URL Shortener

Every concept in this series, used in anger on one real problem

"Design a URL shortener." It sounds almost insultingly simple — turn a long URL into a short one, redirect when clicked. That's exactly why interviewers love it: there's nowhere to hide. Every decision you make reveals whether you actually own the concepts or just recognize them.

This page walks the full 5-step framework from a blank page, the way you would in a 45-minute interview or a real design review. Watch for the concepts from the series doing their jobs — they're linked as they appear.

How to use this page

Read each step's question first and try to answer it yourself before reading on. The gap between your answer and the walkthrough is exactly what to study next.

Step 1: Clarify the Requirements

Never start drawing. Start asking. Here's the conversation:

Question you ask	Answer you get	What it decides
Custom short codes, or generated?	Generated; custom is a nice-to-have	Code-generation strategy
Do links expire?	Optional TTL, default forever	Storage growth, cleanup job
Click analytics?	Yes — counts, referrers	An async pipeline (it must never slow redirects)
Scale?	~100M new links/month, read-heavy	The entire architecture
Latency target?	Redirect must feel instant (<100ms)	Cache strategy, maybe CDN

Functional requirements: shorten a URL, redirect a short code, optional expiry, click analytics. Non-functional: redirects are latency-critical and must be highly available; creation can be slower. Out of scope (say it out loud): user accounts, link editing, spam scanning.

The most important sentence you can say in the first five minutes: "This is a read-heavy system — a link is created once and clicked thousands of times — so I'll optimize everything around the redirect path."

Step 2: Estimate the Scale

Back-of-envelope, as derived in the intro — powers of ten, not precision:

Writes (new links):

100M links/month ÷ ~2.6M seconds/month ≈ ~40 writes/sec average → ~120/sec at peak
Trivial. Any database survives this.

Reads (redirects), assuming a 100:1 read ratio:

10B redirects/month ≈ ~4K reads/sec average → ~12K/sec at peak
This is the number that designs the system.

Storage:

~500 bytes per record (short code, long URL, metadata) × 100M/month ≈ 50 GB/month, ~3 TB over 5 years
Comfortable on one primary with replicas. We don't need to shard — and saying why not scores more points than sharding reflexively.

The design pressure: 12K reads/sec at <100ms, against a trickle of writes. That spells cache-heavy read path, replication for availability, and no exotic storage.

Step 3: Start Simple

Four boxes first. Complexity must be earned.

The naive version — correct, just not fast yet

requestresponse· hover a node to trace it

Every create and every redirect hits the database directly — fine at 40 writes/sec, doomed at 12K reads/sec.

Two endpoints — a REST contract you can state in one breath:

Endpoint	Does	Returns
`POST /links` `{ "url": "https://very/long/path" }`	Creates a short code	`201` + `{ "short": "qa.st/x7Kp9aQ" }`
`GET /x7Kp9aQ`	Looks up the code	`301/302` redirect to the long URL

Now the two design decisions that make or break this system.

Decision 1: How do you generate the short code?

A 7-character Base62 code (a–z, A–Z, 0–9) gives 62⁷ ≈ 3.5 trillion combinations — at 100M links/month, that's ~3,000 years of namespace. The question is how to pick each code:

Hash the URL vs count and encode

Hash-basedmd5(url) → take 7 chars

No coordination — any server can generate
Same URL hashes to the same code (free dedup)
Truncating a hash invites collisions
Must check-and-retry on every collision

Best forSimple systems where occasional collision retries are acceptable.

Counter + Base62next ID → encode as code

Zero collisions by construction
Codes are short and dense from day one
The counter is a single point of coordination
Sequential codes are guessable (enumerate all links!)

Best forHigher scale — with ranges pre-allocated per server to kill the bottleneck.

The senior move is the hybrid: each app server leases a range of the counter (server A gets 1M–2M, server B gets 2M–3M) from a coordination store, then encodes locally — no per-request coordination, no collisions. Sprinkle in a random offset so codes aren't enumerable.

QA Lens Collision handling is the test nobody writes. Force two different URLs onto the same code (stub the hash) and assert the second gets a different code — not a silent overwrite that redirects thousands of users to the wrong site. Then test enumerability: if /x7Kp9aQ exists, does /x7Kp9aR leak someone else's private link? Sequential codes turn your database into a public directory.

Decision 2: `301` or `302` redirect?

This tiny detail is a complete trade-off story:

	`301` Moved Permanently	`302` Found (temporary)
Browser behavior	Caches the redirect — next click skips your server entirely	Asks your server every time
Server load	Lower (browsers absorb repeat clicks)	Full traffic, every click
Analytics	Lost after the first click per browser	Every click counted
Change/expire the link later	Can't — browsers remember the old target	Works

Analytics was a requirement, so: 302 — and now you know why every commercial shortener chooses it, accepting higher traffic as the price of seeing the clicks.

Step 4: Find the Bottlenecks

Attack the naive design with the peak numbers. 12K reads/sec against one database is rude — and unnecessary, because this workload is a cache's dream: tiny values, massively repeated keys, and the data never changes (a code always maps to the same URL), so invalidation — the genuinely hard part of caching — barely exists here.

The production version — every box earned

requestresponse· hover a node to trace it

The redirect path never touches the primary — Redis serves hot links, and clicks ride the queue off the hot path.

Walk the redirect: load balancer → stateless app server → Redis hit (~1ms) for the hot 20% of links serving ~95% of traffic → on miss, fall back to a read replica and write back to the cache. Meanwhile the click event goes onto a queue — the user's redirect never waits for analytics.

Now interrogate it, component by component:

Redis dies?

Reads fall through to replicas — slower, but alive. Test the stampede: can the DB take the cold-cache burst?

Primary dies?

Redirects keep working from cache + replicas (reads don't need the primary). Creation pauses until failover promotes a replica.

A link goes viral?

A hot key is the cache's best case — one key, millions of hits, ~1ms each. The DB never feels it.

Analytics worker falls behind?

The queue absorbs the backlog; redirects are unaffected. Alert on queue depth before it grows unbounded.

Same URL submitted twice?

Idempotent create: return the existing code instead of minting a duplicate — one URL, one code.

QA Lens The redirect path also needs security tests: this service is an open-redirect machine by design, which makes it a phishing favorite. Verify that creating a link to a known-bad domain is rejected, that expired links return 410 Gone (not a redirect to nowhere), and that the analytics queue being completely down doesn't add a single millisecond to a redirect — async means async, and only a fault-injection test proves it.

Step 5: Say the Trade-offs Out Loud

Every choice above has a cost. Naming them unprompted is what separates "drew the boxes" from "owns the design":

Choice	What we gained	What we paid
Cache-aside Redis	~1ms reads for hot links	Cold-start stampede risk; one more component to operate
Counter ranges + Base62	No collisions, no coordination per request	Range-lease bookkeeping; must defend against enumeration
`302` over `301`	Full click analytics, links stay editable	Every click hits our servers — we eat the traffic
Replicas, no shard	Operational simplicity; reads scale	A write ceiling we won't reach for years — revisit if write volume 100×s

The closing line in an interview

"It's read-heavy, so the redirect path is cache-first with replicas behind it, analytics go async over a queue so they can never slow a redirect, and I chose 302 over 301 to keep the click data — at the cost of serving every click. I'm deliberately not sharding: 3 TB over five years doesn't justify it." Four sentences, and every one is a justified trade-off.

Test Yourself

The follow-ups an interviewer would actually ask:

Q1. Marketing wants links that can be edited after creation to point somewhere new. What breaks, and what must change?

Two things: 301 would have been fatal (browsers cached the old destination forever — good thing we chose 302), and the cache now has a real invalidation problem for the first time — on edit, you must update the DB and delete the Redis key, or users get the stale target for the TTL. The "data never changes" assumption was doing a lot of quiet work.

Q2. Write volume grows 100× — 4K creates/sec. Walk the write path: what gives out first?

The single primary. Options in order of escalation: fatter primary (vertical), then split the namespace across multiple primaries by code prefix — which is sharding, with the shard key chosen by hash so sequential counters don't create a hot shard. The counter-range scheme already distributes; it scales with you.

Q3. Users in Australia complain redirects take 400ms. Your servers are in Virginia. What's the fix, and what's the catch?

That's propagation latency — physics, not code. Move the lookup closer: regional deployments or a CDN/edge layer that can answer redirects from a replicated code→URL map. The catch: now the mapping is replicated globally, so an edited or expired link takes time to propagate — you've traded consistency lag for latency, which is PACELC in the wild.

Q4. Why is "the analytics pipeline lost 0.1% of click events" acceptable here, when "the payment queue lost 0.1% of charges" would be a disaster?

Requirements decide guarantees, not technology. Click counts are aggregate trends — approximate is fine, so at-least-once delivery with occasional loss under failure is a sane cost/benefit. Payments need every event exactly-once-processed with idempotency keys. Same queue technology, opposite contracts — which is why you ask about the requirement before designing the pipeline.

Quick Revision

The shape

Read-heavy (100:1), latency-critical reads, trickle of writes, data that never changes. A cache's dream.

Short codes

Base62, 7 chars ≈ 3.5T codes. Counter ranges per server: no collisions, no bottleneck, not enumerable (add an offset).

Read path

LB → stateless servers → Redis → replica on miss. 302 keeps analytics; clicks go async via queue.

Restraint

No shard, no microservices, no Kafka cluster — the numbers don't demand them. Complexity must be earned.

Where to Next

The Cheat Sheet — every concept from the series on one printable page
What is System Design? — the framework this page just exercised
Then do it yourself: design a pastebin, an image host, or a rate limiter with the same five steps — they're all neighbors of this problem

Step 1: Clarify the Requirements​

Step 2: Estimate the Scale​

Step 3: Start Simple​

Decision 1: How do you generate the short code?​

Decision 2: 301 or 302 redirect?​

Step 4: Find the Bottlenecks​

Step 5: Say the Trade-offs Out Loud​

Test Yourself​

Quick Revision​

Where to Next​