Prerequisites
HTTP: The Protocol That Delivered the Web
Tim Berners-Lee's simple request-response protocol turned the internet into a global hypertext system.
TL;DR
By 1989, the internet existed but was mostly invisible — a tangle of FTP servers, Gopher menus, and email lists. Physicists at CERN couldn’t even share documents across departments. Tim Berners-Lee proposed a fix: combine hypertext (clickable links between documents) with TCP/IP (the network that connected everything). The result was three inventions — URLs, HTML, and HTTP. HTTP was the simplest: a text-based request-response protocol where a client asks for a resource and a server sends it back. No sessions, no login, no memory of previous requests. That radical statelessness made HTTP easy to implement, easy to cache, and easy to scale. Within four years, the web went from one server in a Swiss physics lab to millions of pages. HTTP didn’t just deliver documents — it delivered the platform the modern internet is built on.
”Vague, but Exciting”
In March 1989, Tim Berners-Lee — a 33-year-old software engineer at CERN — submitted a proposal titled “Information Management: A Proposal.” His boss, Mike Sendall, scribbled “Vague, but exciting” on the cover. It wasn’t an endorsement. It wasn’t a rejection. It was enough.
The problem was real. CERN had thousands of researchers rotating in and out, each using different computers, different formats, different documentation systems. Knowledge arrived and vanished with the people who carried it. Berners-Lee had seen this for a decade — he’d even built a personal hypertext tool called ENQUIRE in 1980 to track the web of relationships between people, projects, and machines at the lab.
His 1989 proposal was bigger. He wanted a system where any document could link to any other document, across any machine on the network, using a single consistent interface. Not a centralized database. Not a file server with a directory tree. A web — where information could be followed through links, one hop at a time.
He needed three things:
- A way to name any resource on the network — that became the URL
- A language for writing linked documents — that became HTML
- A protocol for requesting and delivering those documents — that became HTTP
By Christmas 1990, Berners-Lee had all three running on a NeXT workstation in his office. The first web server — info.cern.ch — and the first web browser (called WorldWideWeb) were the same machine.
The Simplest Protocol That Could Work
HTTP was deliberately primitive. Berners-Lee knew that complexity killed adoption — the internet was already full of powerful protocols nobody used because they were too hard to implement. Gopher required specific menu structures. WAIS needed complex query syntax. FTP demanded authentication and state.
HTTP 0.9, the first version, was a single line:
GET /index.html
That’s it. No headers, no metadata, no version number. The server would respond with the HTML content and immediately close the connection. You couldn’t even tell the server what format you wanted — it just sent whatever the file was.
This was crude, and it was exactly right. Any programmer could write an HTTP client in an afternoon. Any sysadmin could run a server. The protocol was so simple that mistakes were cheap and experimentation was free.
HTTP/1.0: Headers and Content Types
By 1993, the web was growing fast enough that HTTP 0.9’s limitations were painful. You couldn’t send images. You couldn’t tell the server who you were. You couldn’t indicate an error — the connection just closed, and you had to guess what went wrong.
HTTP/1.0 added headers — key-value pairs of metadata attached to both requests and responses:
GET /paper.html HTTP/1.0
Host: info.cern.ch
Accept: text/html
User-Agent: NCSA_Mosaic/2.0
HTTP/1.0 200 OK
Content-Type: text/html
Content-Length: 1024
<html>...
Three additions changed everything:
Status codes gave the server a vocabulary. 200 OK meant success. 404 Not Found meant the resource didn’t exist. 301 Moved Permanently meant it had a new address — follow the redirect. These three-digit numbers became so culturally embedded that “404” is a metaphor for absence outside of computing.
Content-Type headers (the MIME type) broke HTTP free from text. A server could now say “this response is image/gif” or "application/pdf", and the browser would know how to handle it. The web stopped being a hypertext system and became a hypermedia system — documents, images, video, anything.
Host headers let multiple websites share a single IP address. Without this, every domain needed its own machine. Virtual hosting made the web affordable to run at scale.
Statelessness: The Feature That Felt Like a Bug
Every HTTP request is independent. The server doesn’t remember who you are, what you asked for last, or whether you’re logged in. Each request carries everything the server needs to respond — and then the server forgets.
Request 1: GET /cart → "Who are you?"
Request 2: GET /cart → "Who are you?"
Request 3: GET /cart → "Who are you?"
This drove application developers crazy. How do you build a shopping cart if the server forgets you between clicks? How do you stay logged in? How do you have a multi-step workflow?
But statelessness was the web’s secret weapon. Because the server held no per-client state:
- Any server could handle any request — load balancers could spread traffic freely
- Crashed servers lost nothing — restart and keep going
- Caches worked naturally — if a response depends only on the request, intermediaries can store and reuse it
- The protocol scaled to billions of users without requiring servers to track billions of sessions
The workarounds for state — cookies (1994), sessions, tokens — were bolted on later. They were ugly but effective. And the underlying protocol remained clean.
# HTTP in 15 lines: a raw TCP request
import socket
# HTTP rides on top of TCP — open a connection, send text, read text
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.connect(("example.com", 80))
s.sendall(b"GET / HTTP/1.0\r\nHost: example.com\r\n\r\n")
response = b""
while chunk := s.recv(4096):
response += chunk
headers, body = response.split(b"\r\n\r\n", 1)
print(headers.decode()) # HTTP/1.0 200 OK ...
print(len(body), "bytes of HTML")
There’s no magic. HTTP is text over a TCP socket. The protocol that delivers the modern web is legible to anyone with telnet.
Methods: A Vocabulary for the Web
HTTP 0.9 had one verb: GET. Give me this resource. HTTP/1.0 added more, and HTTP/1.1 formalized them into a small, powerful vocabulary:
| Method | Meaning | Safe? | Idempotent? |
|---|---|---|---|
GET | Retrieve a resource | Yes | Yes |
POST | Submit data (create) | No | No |
PUT | Replace a resource | No | Yes |
DELETE | Remove a resource | No | Yes |
HEAD | GET without the body | Yes | Yes |
Safe means the request doesn’t change anything — you can GET a page a thousand times without side effects. Idempotent means doing it twice has the same effect as doing it once — DELETE the same resource twice and it’s still gone.
These properties aren’t enforced by the protocol — a badly written server could delete a database on GET. But they’re a contract. Browsers rely on GET being safe to prefetch links. Caches rely on idempotency to retry failed requests. The entire architecture of the web depends on servers honoring these semantics.
This vocabulary would later become the foundation of REST — the idea that HTTP’s methods aren’t just for fetching web pages, but for manipulating any resource on the internet.
HTTP/1.1: The Version That Ran the Web for 20 Years
HTTP/1.0 opened a new TCP connection for every single request. Load a page with 30 images? That’s 31 TCP handshakes — each one adding hundreds of milliseconds of latency.
HTTP/1.1, standardized in 1997, fixed this with persistent connections: keep the TCP socket open and reuse it for multiple requests. It also added:
- Chunked transfer encoding — the server can start sending before it knows the total size, enabling streaming
- Cache-control headers — fine-grained rules for how long intermediaries can store responses
- Content negotiation — the client says “I accept English or French, HTML or JSON” and the server picks the best match
- Pipelining — send multiple requests without waiting for responses (though this was so buggy in practice that browsers mostly disabled it)
HTTP/1.1 was released in 1997 and remained the dominant version until HTTP/2 arrived in 2015 — eighteen years as the protocol of the web. Few specifications in computing have been that durable.
What HTTP Got Right
Berners-Lee made a series of choices that looked like shortcuts but turned out to be architectural insights:
- Text-based protocol — HTTP is human-readable. You can debug it with
telnet, inspect it in a browser, understand it without a spec sheet. This made it learnable in a way that binary protocols never were, and accelerated adoption by orders of magnitude. - Statelessness — every request stands alone. This seemed like a limitation until the web needed to scale to billions of users across thousands of servers. Statelessness made load balancing, caching, and fault tolerance natural rather than engineered.
- The method/resource model — “apply this verb to this noun” (
GET /paper.html) is the simplest possible API. It’s so intuitive that the same model now powers most of the internet’s APIs, not just its web pages. - Extensibility through headers — HTTP didn’t try to anticipate every need. Instead, it gave clients and servers a generic way to exchange metadata. Cookies, caching, authentication, compression, content negotiation — all retrofitted through headers without changing the protocol itself.
HTTP’s great limitation was also its great virtue: it did almost nothing. It didn’t prescribe document formats — HTML handled that. It didn’t define addresses — the URL handled that. It didn’t encrypt anything — SSL/TLS handled that. HTTP was the minimal, universal transport layer that let everything else evolve independently.
One protocol, serving plain text from a NeXT workstation in Switzerland. That was the entire web in 1991. The protocol hasn’t fundamentally changed — it just carried more and more of the world on top of it.