The RSA Math, Line by Line

#TL;DR

RSA’s key generation takes eight lines of Python. Those eight lines encode 200 years of number theory: Fermat’s little theorem, Euler’s totient function, Euler’s theorem, Bézout’s identity, and the extended Euclidean algorithm. This post walks through the math behind the equation (m^e)^d ≡ m (mod n) — why it works, what each piece is doing, and why e = 65537 is the industry-standard public exponent. None of it requires math beyond high school algebra. All of it is necessary for RSA to be anything other than guesswork.

#Setup: Modular Arithmetic

Everything in RSA happens modulo a number. Working modulo n means you only care about remainders when you divide by n. Two numbers are “the same mod n” if their difference is a multiple of n:

17 ≡ 5 (mod 12)          because 17 - 5 = 12
100 ≡ 4 (mod 12)         because 100 - 4 = 96 = 8 × 12
-1 ≡ 11 (mod 12)         because -1 - 11 = -12

Modular arithmetic preserves addition and multiplication. If a ≡ b (mod n) and c ≡ d (mod n), then a + c ≡ b + d (mod n) and a × c ≡ b × d (mod n). This means you can reduce at any point — you don’t have to compute huge intermediate values as long as you keep taking mod n along the way.

Exponentiation reduces too: a^k mod n can be computed without ever letting a^k get large. Exponentiation by squaring computes it in roughly log(k) steps:

def mod_exp(base, exp, mod):
    result = 1
    base = base % mod
    while exp > 0:
        if exp & 1:
            result = (result * base) % mod
        exp >>= 1
        base = (base * base) % mod
    return result

mod_exp(65, 17, 3233)  # 65^17 mod 3233 = 2790

Python’s built-in pow(base, exp, mod) does this for you. It’s the engine that makes RSA practical on real-world key sizes — raising a 2048-bit number to a 2048-bit power without intermediate blowup.

#Fermat’s Little Theorem

In 1640, Pierre de Fermat stated a result that would turn out to be one of the cornerstones of public-key cryptography:

If p is prime and a is not divisible by p, then a^(p-1) ≡ 1 (mod p).

For example, with p = 7 and a = 3:

3^6 = 729 = 104 × 7 + 1
so 3^6 ≡ 1 (mod 7)

Check any a from 1 to 6, and a^6 mod 7 is always 1. This isn’t a coincidence — it reflects the structure of the multiplicative group of integers mod p.

Fermat’s theorem gets close to what RSA needs, but not quite. RSA doesn’t work modulo a prime — it works modulo n = pq, a product of two primes. For that, we need the generalization.

#Euler’s Totient and Euler’s Theorem

The generalization came from Leonhard Euler a century later. He defined the totient function φ(n) as the count of positive integers less than n that are coprime to n (share no factors with n other than 1):

φ(1) = 1
φ(7) = 6    (1, 2, 3, 4, 5, 6 all coprime to 7)
φ(12) = 4   (1, 5, 7, 11 coprime to 12)

When p is prime, everything from 1 to p-1 is coprime to p, so φ(p) = p - 1. This recovers Fermat.

The important property for RSA is how φ behaves on products of coprime numbers:

φ(mn) = φ(m) × φ(n)        when gcd(m, n) = 1

In particular, for two distinct primes p and q:

φ(pq) = φ(p) × φ(q) = (p - 1)(q - 1)

This is the formula RSA key generation uses. If you know p and q, you can compute φ(n) directly. If you only know n = pq, you’d have to factor n first — and factoring large n is the problem RSA assumes is hard.

Euler’s theorem generalizes Fermat:

If gcd(a, n) = 1, then a^φ(n) ≡ 1 (mod n).

For n = pq with p and q distinct primes:

a^((p-1)(q-1)) ≡ 1 (mod pq)     when gcd(a, pq) = 1

This is the equation everything else in RSA is built on.

#The RSA Identity: Why `(m^e)^d ≡ m (mod n)`

RSA picks a public exponent e and a private exponent d such that:

e × d ≡ 1 (mod φ(n))

That is, d is the modular inverse of e modulo φ(n). It exists if and only if gcd(e, φ(n)) = 1, which is why e must be coprime to φ(n).

Given that relationship, here’s why encrypting-then-decrypting gets you back the original message. For any message m:

(m^e)^d = m^(ed)                             by exponent rules
        = m^(1 + k·φ(n))                     because ed ≡ 1 (mod φ(n)), so ed = 1 + k·φ(n) for some integer k
        = m · m^(k·φ(n))                     splitting the exponent
        = m · (m^φ(n))^k                     again exponent rules
        ≡ m · (1)^k  (mod n)                 by Euler's theorem, if gcd(m, n) = 1
        ≡ m (mod n)

The encryption scrambles with e. The decryption un-scrambles with d. The two operations cancel because ed is “almost 1” (specifically: 1 plus a multiple of φ(n)), and Euler’s theorem says raising anything coprime to n by a multiple of φ(n) gives you 1.

The edge case — when gcd(m, n) ≠ 1, i.e., m is a multiple of p or q — actually still works out by a separate argument using the Chinese Remainder Theorem. But this case is cryptographically uninteresting: if your “message” happens to be a multiple of one of the prime factors of n, you’ve just revealed that factor, which is catastrophic anyway.

#Finding `d`: The Extended Euclidean Algorithm

Given e and φ(n), how do we compute d such that e × d ≡ 1 (mod φ(n))?

The tool is the extended Euclidean algorithm. The standard Euclidean algorithm finds gcd(a, b) by repeated division. The extended version additionally computes integers x and y such that:

a × x + b × y = gcd(a, b)                    Bézout's identity

If gcd(e, φ(n)) = 1, then there exist integers x and y with:

e × x + φ(n) × y = 1

Taking both sides mod φ(n):

e × x ≡ 1 (mod φ(n))

So d = x mod φ(n). Python has this built in as pow(e, -1, phi) since 3.8:

def extended_gcd(a, b):
    if b == 0:
        return a, 1, 0
    g, x1, y1 = extended_gcd(b, a % b)
    return g, y1, x1 - (a // b) * y1

phi = 3120
e = 17
g, d, _ = extended_gcd(e, phi)
d = d % phi   # normalize to [0, phi)
# d = 2753, and indeed 17 * 2753 = 46801 = 15 * 3120 + 1

This is how the magic number d = 2753 in the RSA post actually gets computed. There’s no guessing, no search — it’s a deterministic consequence of the extended Euclidean algorithm running on the public exponent and the totient.

#Why `e = 65537`

In practice, the public exponent e isn’t picked randomly. There’s a strong convention — almost a universal convention — that e = 65537. Why this specific number?

65537 is 2^16 + 1. It has three useful properties:

It’s prime. A prime e is automatically coprime to any φ(n) (unless φ(n) is divisible by 65537, which is vanishingly unlikely for random primes p and q).
Its binary representation is sparse. 65537 in binary is 10000000000000001. Exponentiation by squaring with a 17-bit exponent requires only 17 squarings and 1 multiplication, making public-key operations (encryption and signature verification) very fast.
It’s large enough to avoid small-exponent attacks. The notorious low-exponent attacks on RSA (especially with e = 3) exploit the fact that if the message m is small enough, m^e might be less than n, meaning the modular reduction doesn’t actually reduce anything. An attacker can then just compute the e-th root of the ciphertext over the integers. e = 65537 is big enough that m^e effectively always exceeds n for any non-degenerate message, closing this attack.

You can pick other values of e, but 65537 is the Goldilocks choice: prime, fast to exponentiate, and big enough to be secure. Every TLS library you’ve ever used is probably generating RSA keys with e = 65537.

#The Primes `p` and `q`

Key generation picks two primes and multiplies them. A few things matter about how they’re picked:

p and q must be large. The security of RSA depends on factoring n = pq being infeasible. Modern keys use 1024-bit primes, producing 2048-bit n. 4096-bit keys are becoming standard for long-lived keys.
p and q must be roughly the same size (within a few bits). If one is much smaller than the other, Fermat’s factoring method or the elliptic-curve method can find the smaller factor quickly.
p - 1 and q - 1 shouldn’t have only small factors. Otherwise Pollard’s p-1 algorithm can factor n efficiently. “Strong primes” were designed to avoid this, though modern RSA implementations generally treat this as a solved non-issue because the probability of accidentally hitting a weak prime with random generation is vanishingly small.
p and q must be generated independently and randomly. If two RSA keys in the wild happen to share a prime factor, gcd(n1, n2) instantly reveals the shared factor and breaks both keys. This has happened in practice — in 2012 researchers found that a non-trivial fraction of HTTPS keys on the public internet shared factors with each other due to broken RNGs in embedded devices.

Primes are generated by picking a random odd number of the right bit length and running a probabilistic primality test (usually Miller-Rabin) until one passes. A 1024-bit random odd number has roughly a 1-in-700 chance of being prime, so you test a few hundred candidates before you find one.

#Padding: The Part the Math Skips

The bare mathematical RSA described here — “encrypt m as m^e mod n” — is called textbook RSA, and it is not safe to use in practice. It has several weaknesses:

It’s deterministic. Encrypting the same message twice gives the same ciphertext. An attacker can tell when you’ve sent the same message again, and can build dictionaries.
It’s malleable. Given c = m^e mod n, an attacker can compute c × 2^e mod n, which decrypts to 2m mod n — a legitimate-looking ciphertext the attacker just constructed out of yours.
Short messages are vulnerable to small-exponent attacks.
Identical messages sent to multiple recipients (Håstad’s attack) can sometimes be recovered.

Real RSA wraps the message in a padding scheme that randomizes and structures the input before exponentiation. Modern RSA uses OAEP (Optimal Asymmetric Encryption Padding) for encryption and PSS (Probabilistic Signature Scheme) for signatures. Older systems used PKCS#1 v1.5 padding, which has had multiple attacks against it (Bleichenbacher, 1998) that keep getting rediscovered in new implementations.

The point: the math we just walked through is the primitive. The cryptosystem that uses it is a lot more careful. The RSA post quietly elides this distinction in the name of clarity; the real difference between textbook RSA and RSA-OAEP is decades of attacks and patches.

#What the Math Actually Proves

If you’ve followed this: the security of RSA depends on one conjecture — that factoring large n is computationally infeasible — plus a handful of theorems that are about 300 years old. The conjecture is what a quantum computer would break. The theorems are the same ones Euler proved in the 18th century and are not going anywhere.

The RSA post describes the key generation as “the math, the proof, the algorithm” that Rivest worked out after a Passover seder. What he assembled that night was specifically: Euler’s theorem from 1761, the extended Euclidean algorithm from 300 BC, and a choice of primes whose product would make the factoring problem hard. The brilliance was in the synthesis. The pieces were already there — he just had to see that they fit together into a lock and key.

Fifty years later, everyone who’s ever secured a web connection has been using those same eight lines.