APIs & Communication
The contracts that let two programs trust each other
A frontend engineer and a backend engineer have never spoken. They sit in different time zones, ship on different schedules, and write in different languages. Yet their code has to interlock perfectly on launch day. The only thing holding them together is a contract — an API.
The networking layer got bytes from one machine to another. This layer decides what those bytes mean. We'll walk the four styles you'll actually argue about in a design review (REST, GraphQL, WebSockets, Webhooks) and, just as important, when to reach for each.
APIs as a contract, then the four styles you'll defend in any design review: REST, GraphQL, WebSockets, and Webhooks. Every concept ends with a QA testing lens: how a tester would probe or break it.
APIs — the menu, not the kitchen
An API (Application Programming Interface) is a contract between two pieces of software. It declares what you can ask for, how to ask, and what comes back — and deliberately hides everything else.
The restaurant analogy — click to expand
You don't walk into the kitchen and cook. You read the menu (the API docs), place an order (the request), and your food arrives (the response). You neither know nor care whether the kitchen uses a gas range or induction — that's an implementation detail behind the contract. A good API is a good menu: complete, unambiguous, and stable enough that you can reorder tomorrow and get the same dish.
The contract is what makes large systems buildable by many teams at once. Change the kitchen all you like; as long as the menu holds, every diner is fine.
QA Lens
If the API is a contract, then the contract — not the implementation — is what you test.
Generate tests from the spec (OpenAPI/Swagger) so documentation and behavior can't drift apart:
a field documented as required but accepted as missing is a bug even though nothing
crashes. And probe what the menu doesn't offer — undocumented endpoints and verbs that
"work anyway" are unowned attack surface.
REST — resources and verbs
REST (Representational State Transfer) is the default API style of the web. Its core trick: treat everything as a resource with a URL, and use standard HTTP verbs to act on it. It maps almost one-to-one onto CRUD.
| You want to… | HTTP method | Example |
|---|---|---|
| List a collection | GET | GET /users |
| Read one item | GET | GET /users/123 |
| Create | POST | POST /users |
| Replace / update | PUT / PATCH | PUT /users/123 |
| Remove | DELETE | DELETE /users/123 |
The three principles worth memorizing: REST is stateless (no per-client session on the server), cacheable (responses can be reused), and offers a uniform interface (predictable URL patterns). That predictability is REST's superpower — and the root of its one real weakness.
If a screen needs a user and their posts and their followers, that's often three round trips to three endpoints. On a slow mobile network, three sequential requests is three chances to feel slow. This is exactly the gap GraphQL was built to close.
QA Lens
REST's predictability makes it a joy to test — but verify the verbs mean what they say.
GET must be safe (never mutates), PUT and
DELETE must be idempotent (call twice, same result). A
DELETE that errors on the second call because the row is "already gone" is a bug:
it should return 204 or 404, not 500.
GraphQL — ask for exactly what you need
GraphQL, born at Facebook, flips the model: instead of many fixed endpoints, there's one endpoint and the client describes the exact shape of data it wants. The server resolves it and returns precisely that — no more, no less.
Where REST might need /users/123 and /users/123/posts, GraphQL folds both into one query.
That kills over-fetching (downloading fields you ignore) and under-fetching (needing more calls).
But nothing is free:
- Dead simple, universally understood
- HTTP caching works out of the box
- Easy to monitor per-endpoint
- Can over-/under-fetch for rich screens
- One request, exactly the fields you need
- Great for varied clients (web, mobile, TV)
- Caching is harder — every query differs
- Resolvers add server complexity; deep nesting can hurt
For most system design problems, REST is the safer default. Reaching for GraphQL signals breadth — just be ready to explain the caching and resolver costs you're taking on.
"GraphQL is REST v2 — newer, so better." GraphQL isn't an upgrade; it's a different trade-off. It moves complexity from the client (multiple calls, over-fetching) to the server (resolvers, query-depth limits, bespoke caching). If your API is straightforward CRUD with one or two client types, GraphQL adds cost and removes free HTTP caching — for nothing in return.
QA Lens GraphQL widens the test surface in a sneaky way: the schema is the contract, so a renamed field breaks clients even though the endpoint URL never changed. Test the schema itself, watch for N+1 query explosions behind innocent-looking nested fields, and cap query depth so a single malicious request can't ask for friends-of-friends-of-friends forever.
WebSockets — when the server needs to talk first
REST and GraphQL are both client-initiated: the client asks, the server answers, the line goes quiet. But chat apps, live scores, and collaborative editors need the server to push the instant something changes. That's WebSockets.
A WebSocket starts life as an ordinary HTTP request, then upgrades into a persistent, two-way pipe. After that, either side can send at any time with no fresh-connection overhead — perfect for real-time features.
The catch is state. Each open socket lives on a specific server, which must track it. That makes scaling harder and failures brutal:
True real-time, low-overhead, bidirectional
Ideal for chat, presence, live dashboards, multiplayer
Stateful — the server remembers every connection
Server dies → every socket on it drops at once
QA Lens Stateful connections demand chaos testing the happy path never reveals. Kill the server mid-session: does the client reconnect and resync missed messages, or silently lose them? Test flaky networks, duplicate deliveries, and out-of-order messages. "It works on localhost" means nothing here — the bugs live in the reconnection logic.
Webhooks — "don't call us, we'll call you"
Polling ("anything new yet? ...now? ...now?") is wasteful. Webhooks invert it: you register a
callback URL once, and the other system POSTs to it the moment an event happens. It's how
Stripe tells you a payment cleared and how GitHub triggers your CI on push.
The hard part isn't receiving the call — it's reliability. What if your server was down when the webhook fired? Production-grade webhook systems lean on three safeguards:
WebSockets keep a live channel open between a client and a server (chat). Webhooks are one-off server-to-server HTTP callbacks for events (Stripe → your backend). Different problems, similar-sounding names.
QA Lens
Webhooks fail in the gaps you can't see locally. Force a retry by returning 500 once
and confirm the event is delivered at least once but processed exactly once —
exactly-once delivery is impossible over a network, which is why the idempotency key
exists. Verify the signature header
(an unsigned webhook endpoint is an open door for spoofed events). And always test the "we were
down for 10 minutes" replay — that's the scenario that pages someone at 3am.
Choosing a Style
Test Yourself
Answer from memory first, then expand to check.
Q1. Calling DELETE /users/123 twice returns 204 then 500. Which REST guarantee is broken, and what should the second call return?
Idempotency. DELETE must be safe to repeat — the second call should return 204 (or 404),
because "the row is already gone" is the successful outcome of a delete, not an error.
Q2. A mobile screen needs a user, their posts, and their followers. When does this justify GraphQL — and when doesn't it?
It justifies GraphQL when many different clients each need different field combinations and the round trips genuinely hurt (slow mobile networks). It doesn't when one backend-for-frontend endpoint or a composite REST endpoint would do — GraphQL's price is resolver complexity and losing plain HTTP caching.
Q3. Chat messages and Stripe payment confirmations are both "real-time events." Why does one use WebSockets and the other webhooks?
Chat is client ↔ server with a live, long-lived, two-way conversation — that's a WebSocket. A payment confirmation is a server → server one-off notification — that's a webhook (an HTTP callback with retries and an idempotency key). The receiver of a webhook doesn't hold a connection open; it just exposes a URL.
Q4. Your webhook endpoint went down for 10 minutes. What three mechanisms decide whether you lost money?
Retries (the sender re-delivers with backoff until you return 2xx), idempotency keys
(so re-delivered events aren't processed twice), and an event log (so you can replay and
audit what was received). Missing any one of the three turns a brief outage into lost or
duplicated events.
Quick Revision
A contract: what you can ask, how, and what returns. The menu, not the kitchen.
Resources + HTTP verbs. Stateless, cacheable, predictable. The safe default.
One query, exact fields. Kills over-fetching; harder to cache, resolver cost.
Persistent two-way channel for real-time. Stateful, so scaling is the hard part.
Server-to-server event callbacks. Needs retries, idempotency, and an event log.
"It's read-heavy CRUD, so REST with HTTP caching. The live presence feature needs WebSockets, and we'll take payment confirmations from Stripe over a signed, idempotent webhook." One sentence, four correct tools.
Where to Next
Contracts move data around — but where does the data actually live?
- Data Storage — databases, indexing, caching, partitioning, and blobs
- Networking Foundations — the layer beneath these contracts