The End-to-End Principle: Why the Internet Is Dumb on Purpose

#TL;DR

In 1984, three MIT researchers — Jerome Saltzer, David Reed, and David Clark — published a paper called “End-to-End Arguments in System Design.” It formalized a design choice TCP/IP had already made a decade earlier: don’t put intelligence in the network; put it at the endpoints. The paper argued that features like reliability, encryption, and compression should generally live at the edges, because only the endpoints know enough to implement them correctly. This single principle explains why IP is minimal, why TCP recovers from loss, why TLS encrypts end-to-end, and why every network that tried to be “smarter than IP” eventually lost to it.

#The Argument

The paper’s core claim is simpler than its name:

A function should be implemented in the network only if it can be implemented correctly there. If the endpoints have to implement it anyway — because the network can’t guarantee it — then doing it in the network is redundant and expensive.

The canonical example in the paper is reliable file transfer. Suppose you want to copy a file from machine A to machine B, and you want end-to-end certainty that B has the file intact. You could try to build a reliable network — where every hop checksum’s every packet, retransmits on loss, and promises bit-perfect delivery.

It wouldn’t be enough.

Even with a perfectly reliable network, the file could be corrupted:

By a bug in A’s file-reading code.
By memory errors on either machine.
By a bug in the program doing the copy.
By disk corruption between when B receives the data and when B finishes writing it.

The only way to be sure the file arrived intact is for A to send a checksum and for B to verify the checksum after writing to disk. That’s an end-to-end check. And once you’re doing that check, the network’s hop-by-hop reliability hasn’t bought you much. You were going to catch the errors anyway.

The paper’s conclusion: features that must exist at the endpoints shouldn’t be duplicated in the network. Let the network do the minimum. Put the intelligence where it can actually verify it’s working.

#What It Looked Like in Practice

TCP/IP embodies the principle in its layering. IP, the network layer, does almost nothing. It forwards packets based on a destination address. It doesn’t verify delivery, doesn’t reorder, doesn’t retransmit, doesn’t even care if the packet arrives at all. A router’s job description fits on one line: look at the destination, forward toward it, forget you ever saw it.

Reliability is a TCP concern. TCP lives on the endpoints — the two machines having the conversation. They track sequence numbers, ACK what they receive, retransmit what they don’t, and hide all of that from the application.

  Application                          Application
  ───────────                          ───────────
      │                                    │
  ┌───┴───┐                            ┌───┴───┐
  │  TCP  │ ◄── reliability here ──►   │  TCP  │
  └───┬───┘                            └───┬───┘
      │                                    │
  ┌───┴───┐   ┌─────┐   ┌─────┐   ┌────────┴───┐
  │  IP   │──►│  IP │──►│  IP │──►│     IP     │
  └───────┘   └─────┘   └─────┘   └────────────┘
                                         
                  ▲                        
                  │                        
              "dumb routers" — forward and forget

The router in the middle could die, be replaced by a different model, be a different link technology entirely — and the endpoints wouldn’t notice, because the endpoints are the ones responsible for making the conversation work.

#Why Dumb Wins

There’s a second half of the paper that often gets overlooked. Saltzer, Reed, and Clark didn’t just argue that endpoints should own reliability — they argued that networks that try to do more than forward packets tend to get stuck. The reason:

Every network feature has a cost, even when you’re not using it. If the network layer provides reliable delivery, every packet pays the overhead, including ones that don’t need it (video frames you’d rather drop than delay, DNS queries that are cheaper to retry than to ACK).
A “smart” network ages badly. Features baked into routers are hard to change. The internet evolved TCP congestion control, TLS, QUIC, and HTTP/3 at the endpoints without touching the network core — that’s only possible because the core is minimal.
New applications need different guarantees. Real-time voice wants low latency and tolerates loss. File transfer wants bulk throughput. A smart network has to pick one. A dumb network lets each app pick for itself.

This is the reason the OSI reference model’s seven layers — with features like session management and presentation encoding inside the network — lost to TCP/IP’s much thinner stack. TCP/IP was easier to evolve precisely because it promised less.

#Where the Principle Gets Violated

Once you learn the principle, you start seeing its violations everywhere. They’re usually defensible in the short term and regrettable in the long term:

NAT (Network Address Translation). A router rewriting addresses on packets breaks the end-to-end model — the endpoints no longer share a coherent view of their own addresses. NAT exists because IPv4 ran out of addresses, and it solved that problem. It also made peer-to-peer applications harder to build for twenty years and required protocols like STUN, TURN, and ICE to paper over.
Deep Packet Inspection. ISPs and enterprises routinely inspect packet payloads to classify traffic, enforce policy, or intercept. Encryption (TLS everywhere) was partly a response to this.
Transparent caching. Web proxies that intercept HTTP and serve cached copies look like a performance optimization. They’re also a violation — now the “endpoint” the client thinks it’s talking to isn’t the actual origin. TLS killed most of these for a reason.
CGNAT and “carrier-grade” middleboxes. Modern mobile networks route every customer through NAT layers on the carrier side. The phone isn’t on the internet — it’s on a network behind a network, with the carrier’s boxes terminating and restarting connections.
Protocol ossification. TCP extensions added years ago are sometimes still unusable on the public internet because some middlebox will reject packets it doesn’t understand. This is why QUIC runs over UDP — the only layer middleboxes still leave alone.

Each of these is a case where someone decided the network should do more than forward packets. Each of them creates a class of bugs that would have been impossible in a strictly end-to-end system.

#The Principle Is a Default, Not a Law

Saltzer, Reed, and Clark were careful. The paper doesn’t say every function must be at the endpoints — it says the burden of proof is on anyone who wants to put a function in the network. There are cases where a network-layer feature is justified:

Performance optimization that doesn’t require correctness guarantees. Link-layer checksums are fine — they catch common errors without promising anything.
Fairness and resource management. Congestion control has endpoint components (TCP) and network components (queuing, ECN) because neither side can do it alone.
Things the endpoints genuinely can’t do. DNS has to be a network-level service. You can’t look up a name you haven’t resolved yet.

The principle is a design bias, not an absolute rule. When you can push a feature to the endpoints, do it — because that’s how the system stays evolvable. When you can’t, at least be honest that you’re making a compromise.

#What This Means Now

Fifty years after Cerf and Kahn sketched TCP in a hotel lobby, the end-to-end principle still explains almost every interesting property of the internet:

Encryption everywhere — TLS and its successors encrypt data between endpoints, because only the endpoints know the keys. The network can’t decrypt, can’t inspect, can’t lie about content. This was a deliberate design outcome: violate the end-to-end principle, and encryption breaks.
The explosion of overlay networks — CDNs, peer-to-peer systems, Tor, Tailscale — all run over the public internet and ignore the network in the middle. They work because the network is dumb enough to let them.
QUIC and the UDP renaissance — the next generation of transport protocols moved to UDP specifically because UDP is the last surface where the end-to-end principle still holds.

The TCP/IP post describes the hotel-lobby insight as “the network is dumb, the endpoints are smart.” The 1984 paper is what turned that tactical choice into a principle that could be argued for and applied elsewhere. Every time a system design discussion comes down to should this happen at the endpoint or in the middle? — someone in the room is re-deriving Saltzer, Reed, and Clark from first principles. Usually without knowing it.