Blueprints · 1,571 words · 7 min read

The Three-Way Handshake, Packet by Packet

Why TCP needs exactly three packets to open a connection, what the SYN and ACK bits actually do, and why the initial sequence number being predictable became a security disaster.

#TL;DR

Every TCP connection starts with three packets. The client sends a SYN, the server sends a SYN-ACK, the client sends an ACK. Two packets would be enough to synchronize state, but not enough to defend against duplicated or delayed packets from previous connections. Three is the minimum that makes the conversation both synchronized and unambiguous. The sequence numbers exchanged during those three packets become the coordinate system for everything else TCP does — and when Kevin Mitnick showed in 1994 that they were predictable, an entire class of connection-hijacking attacks got a name.

#The Problem TCP Solves at Startup

Before two machines can exchange data over TCP, they need to agree on three things:

  1. A connection exists. Both sides know they’re talking to each other and have resources allocated.
  2. A starting point for numbering. Each direction of the conversation gets its own sequence space. Every byte sent is numbered, which is how TCP handles retransmission, ordering, and duplicate detection.
  3. The other side is actually reachable right now. Not just that a packet arrived — that the other side can also talk back to you, with this particular address, at this particular time.

The three-way handshake is the minimum packet exchange that achieves all three and is robust against stale packets from previous connections drifting in from the network.

#SYN, SYN-ACK, ACK

Here’s what each of the three packets carries. The diagram is familiar, but the details matter.

Client                                         Server
  │                                              │
  │── SYN, seq=x ─────────────────────────────►  │
  │                                              │  allocate connection state,
  │                                              │  choose ISN y
  │                                              │
  │  ◄────────────── SYN, ACK, seq=y, ack=x+1 ── │
  │                                              │
  allocate connection state,                     │
  advance client window                          │
  │                                              │
  │── ACK, seq=x+1, ack=y+1 ──────────────────►  │
  │                                              │
  │══════════════  data flows  ══════════════════│
  • SYN (synchronize) — a packet with the SYN flag set in the TCP header. The client picks an Initial Sequence Number x, and this packet says: “I want to open a connection. My byte numbering starts at x.”
  • SYN-ACK — the server’s response. It picks its own ISN y for the other direction of the connection, and acknowledges the client’s SYN by setting ack = x + 1. Two flags set at once: SYN (“I’m also synchronizing from y”) and ACK (“I received your byte x”).
  • ACK — the client confirms the server’s ISN: ack = y + 1. No SYN flag this time. The handshake is complete.

After the third packet, both sides have agreed on: the connection’s endpoints, the two independent sequence number spaces, and the fact that each side can successfully send to the other.

#Why Two Packets Isn’t Enough

The obvious question: why three? Wouldn’t SYN + SYN-ACK be enough — the client says “let’s talk,” the server says “okay from y,” and they’re done?

The problem is stale packets. Imagine the client opened a connection yesterday, sent a SYN with ISN x, received no answer, and gave up. Today, the same client opens a new connection and sends a SYN with a new ISN x'. Meanwhile, yesterday’s original SYN — which was lost in some router’s queue — finally arrives at the server.

If the protocol were two-way, the server would see a SYN with ISN x, allocate state, respond with SYN-ACK, and start waiting for data bytes starting at x+1. The client, who knows nothing about this, sends data on its new connection starting at x'+1. The server receives bytes with sequence numbers it isn’t expecting and discards them — or worse, accepts them into the wrong conversation.

The third packet fixes this. The server doesn’t fully open the connection until the client confirms the server’s ISN. A stale SYN can still reach the server and trigger a SYN-ACK response, but the client never sends the final ACK — because the client doesn’t know about this phantom connection — so the server times out and tears down the half-open state.

The formal version of this argument is sometimes called the “two-generals problem” applied to connection setup. You can never guarantee both sides have perfect mutual knowledge with a finite message exchange, but three packets get you close enough for practical purposes.

#Initial Sequence Numbers and the Mitnick Attack

For a long time, the ISN was generated naively. Early BSD implementations incremented a global counter by a fixed amount every second, and by another fixed amount per new connection. If you knew when a machine last booted and how many connections it had serviced, you could predict its next ISN.

This sounds academic until you consider what TCP implicitly assumes: only someone who has seen the server’s SYN-ACK knows the server’s ISN. That’s what proves you’re able to receive traffic at the address you claimed. If the server’s ISN is predictable, an attacker can forge the third packet of the handshake without ever receiving the second.

In 1994, Kevin Mitnick used exactly this technique to break into Tsutomu Shimomura’s workstation. The attack:

  1. Mitnick’s system sends a SYN to Shimomura’s machine, claiming to come from a trusted machine’s IP address.
  2. Shimomura’s machine sends a SYN-ACK back to the trusted machine (not Mitnick).
  3. Mitnick had already flooded the trusted machine with SYNs to make it ignore incoming packets.
  4. Mitnick guesses Shimomura’s ISN based on timing and sends the final ACK himself, spoofing the trusted machine’s source address.
  5. Shimomura’s machine now believes it has an open TCP connection with the trusted machine, and starts accepting commands.

The fix was randomized ISNs. RFC 6528 (2012) formalized the requirement that ISNs be derived from a cryptographic hash of the connection tuple plus a secret, making them effectively unguessable to an off-path attacker. This became a hard requirement; any modern OS that didn’t implement it wouldn’t interoperate safely on the public internet.

#SYN Floods and Half-Open Connections

There’s a second attack the handshake opened the door to. When a server receives a SYN, it allocates connection state — a slot in a kernel data structure — and waits for the final ACK. If the ACK never comes, the slot stays reserved until a timeout.

In 1996, attackers started flooding servers with SYN packets from spoofed IP addresses. The server would send SYN-ACKs that went nowhere, and its half-open connection table would fill up. Legitimate clients couldn’t get a slot; the server effectively stopped accepting connections.

The defense was SYN cookies, invented by Daniel J. Bernstein and Eric Schenk. Instead of allocating state on the first SYN, the server encodes the connection parameters into its own ISN using a cryptographic hash. If the client ACKs with a valid cookie, the server can reconstruct the state; if not, nothing was allocated in the first place. This turned the handshake from a stateful operation into a stateless one during overload, and SYN floods became a much less effective attack.

#Closing a Connection

The handshake has a symmetric counterpart at connection teardown, sometimes called the four-way handshake:

  Client                                   Server
    │── FIN ────────────────────────────►    │
    │  ◄─────────────────────── ACK ──────    │

    │   (server still has data to send)

    │  ◄─────────────────────── FIN ──────    │
    │── ACK ────────────────────────────►    │

TCP connections are full-duplex and can be closed independently in each direction. The client FINs its direction, the server ACKs it — but the server can keep sending for as long as it wants, until it sends its own FIN. The four-way is really two one-way closes, each with its own ACK.

After the last ACK, the closer enters the famous TIME_WAIT state — typically 60 to 120 seconds — where it refuses to reuse the local port number. This exists precisely to catch stray packets from the just-closed connection, so they don’t accidentally land in a brand-new connection reusing the same four-tuple. TIME_WAIT is why high-traffic servers periodically run out of ephemeral ports under load; it’s a direct consequence of the handshake’s correctness guarantees.

#Making the Handshake Go Away

The three-way handshake costs one round-trip of latency before any data can flow. On a local network that’s invisible; on a satellite link it’s 500+ ms of dead time every time a connection opens.

Several things have tried to get around it:

  • TCP Fast Open (RFC 7413) lets a client that’s talked to a server before send data in the SYN packet itself, using a cryptographic cookie from the previous connection as proof it’s not an attacker. It works but is inconsistently supported by middleboxes.
  • TLS 1.3 0-RTT allows a client resuming a session to send encrypted application data in the first TLS message — but at the cost of weaker replay protection for that first message.
  • QUIC folds the transport and cryptographic handshakes together. For a resumed connection, QUIC can be effectively 0-RTT: the first packet contains data.

All of these share a theme: the three-way handshake’s cost was acceptable when round-trips were 10 ms. It isn’t when your service is global and your p99 RTT is 200 ms. The handshake hasn’t been replaced — it’s been amortized.

#Fifty Years of Three Packets

The TCP/IP post shows the handshake diagram in four lines. Those four lines have been the starting point of every TCP connection ever made — on ARPANET in 1974, on a mobile phone in 2026, and on something running in a rack somewhere in between. The details have been refined: sequence numbers are cryptographic now, cookies handle overload, and modern transports fold the handshake into a longer negotiation with encryption built in. But the structure is the same:

  • Three packets, because two isn’t enough to rule out stale state.
  • Two independent sequence spaces, because the conversation is full-duplex.
  • State allocated only after both sides confirm liveness, because allocating on the first packet is exploitable.

Every property of TCP that follows — reliability, ordering, flow control, congestion control — depends on the handshake establishing that shared coordinate system first. It’s the foundation you can’t see once the connection is open. It’s also the first thing any new transport has to decide whether to keep.