Workshop · 1,445 words · 6 min read

Git: Distributed Version Control for the Angry Finn

Linus Torvalds built Git in two weeks out of frustration, and accidentally created the tool that powers all modern software development.

#TL;DR

In April 2005, the Linux kernel community lost access to BitKeeper, the proprietary version control system they’d been using for three years. Linus Torvalds, frustrated with every existing alternative, spent two weeks building his own. His design priorities were extreme: speed (operations should take milliseconds, not seconds), data integrity (every bit of history is cryptographically verified), and full distribution (every clone is a complete repository with complete history). The result was Git — a tool so fast and so different from its predecessors that it took years for the industry to understand it. Git didn’t just replace CVS and Subversion. Combined with GitHub (2008), it reinvented how software is built: pull requests, code review, open-source collaboration, CI/CD pipelines — the entire modern development workflow runs on Git’s model of distributed, branching, content-addressed history.

#The BitKeeper Crisis

The Linux kernel is one of the largest collaborative software projects in history. By 2005, thousands of developers were contributing to a codebase with millions of lines. Managing those contributions required version control.

For years, Torvalds had resisted using any version control system. He’d accepted patches by email, applied them manually, and released tarballs. When the project finally adopted BitKeeper in 2002 — a commercial distributed VCS that offered free licenses to open-source projects — it was a pragmatic compromise.

Then, in April 2005, Andrew Tridgell (creator of Samba) reverse-engineered parts of the BitKeeper protocol. BitKeeper’s owner, Larry McVoy, revoked the free licenses. The Linux kernel community was suddenly without version control.

Torvalds looked at the alternatives — CVS, Subversion, Monotone, Darcs — and found them all unacceptable. CVS and Subversion were centralized and slow. Monotone had the right model but was too slow for the kernel’s scale. None of them met his three requirements: speed, data integrity, and distribution.

So he built his own. The first commit in Git’s own repository is dated April 7, 2005. By April 29, Git was hosting the Linux kernel. Two weeks from nothing to production.

#Content-Addressed Storage

Git’s foundational design choice is that everything is identified by the SHA-1 hash of its content. A file isn’t stored by its name — it’s stored by a hash of what it contains. A commit isn’t identified by a sequential number — it’s identified by a hash of the commit message, the author, the timestamp, the parent commit, and the tree of files it points to.

$ echo "hello" | git hash-object --stdin
ce013625030ba8dba906f756967f9e9ca394464a

$ echo "hello!" | git hash-object --stdin
8f65026e5e723ba62e53416b48bab21341fa9a27

Change one character, and the hash is completely different. This means:

  • Every object is immutable — once stored, its content can never change without changing its hash
  • Integrity is automatic — if a bit flips on disk or during transfer, the hash won’t match, and Git will detect the corruption
  • Deduplication is free — identical content always produces the same hash, so it’s stored only once

Git’s object store contains only four types: blobs (file contents), trees (directories — lists of blob and tree references), commits (a tree pointer, parent pointer, author, and message), and tags (named references to commits).

commit a1b2c3
├── tree d4e5f6
│   ├── blob 7a8b9c  README.md
│   ├── blob 1d2e3f  src/main.js
│   └── tree 4g5h6i  src/lib/
│       └── blob 7j8k  utils.js
└── parent commit 9m0n1o

A commit points to a tree. A tree points to blobs and other trees. Everything is a hash. The entire history of a project is a directed acyclic graph (DAG) of content-addressed objects.

#Branches Are Cheap

In Subversion, creating a branch meant copying the entire project directory on the server. It was slow and expensive, so teams avoided branching. In Git, a branch is a 41-byte file containing a commit hash.

$ cat .git/refs/heads/main
a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0

$ git branch feature-x
# creates .git/refs/heads/feature-x pointing to the same commit

Creating a branch takes microseconds. Switching branches rewrites the working directory from the object store — also near-instant. This cheapness changed how people work:

# The modern Git workflow
git checkout -b fix-login-bug     # branch per task (microseconds)
# ... make changes, commit ...
git push origin fix-login-bug     # push to remote
# open a pull request, get review
git checkout main
git merge fix-login-bug           # merge when approved
git branch -d fix-login-bug       # clean up

Branches went from a heavyweight operation you did quarterly (SVN release branches) to something you did dozens of times a day. This enabled feature branches, pull requests, and the entire GitHub-style collaboration workflow.

#The Distributed Model

In centralized version control (CVS, SVN), there’s one server with the repository. Developers check out a working copy, make changes, and commit back to the server. No network, no commits.

Git inverts this. Every clone is a full repository with complete history:

          [origin] (GitHub, GitLab, etc.)
           /     |     \
     [alice]  [bob]  [carol]
       |        |       |
    full      full     full
    history   history  history

You can commit, branch, merge, log, diff, bisect, and blame entirely offline. The remote repository is just another clone — a meeting point for exchanging changes, not a central authority. git push and git pull are just operations that sync objects between two repositories.

This model was designed for the Linux kernel’s workflow — thousands of maintainers with a hierarchy of trusted integrators, no single point of failure — but it turned out to be the right model for everyone. It enabled:

  • Offline work — commit on a plane, push when you land
  • Fast operations — everything is local, so everything is fast
  • Multiple remotes — push to your fork, pull from upstream, sync with a colleague’s branch
  • Resilience — if GitHub goes down, every developer has a complete copy of the project

#The Merge Machine

Git’s merge capabilities are what made distributed development practical. When two developers change different files, Git merges automatically. When they change different parts of the same file, Git merges automatically. Only when they change the same lines does Git require human intervention — a merge conflict.

<<<<<<< HEAD
const timeout = 5000;
=======
const timeout = 10000;
>>>>>>> feature-branch

Git’s three-way merge algorithm compares both branches against their common ancestor to determine what changed and where. This is more sophisticated than a simple diff — it can handle renames, moves, and simultaneous additions.

Rebase offers an alternative to merge: instead of combining two branches with a merge commit, replay one branch’s commits on top of the other:

Before rebase:              After rebase:
main:    A─B─C              main:    A─B─C
              \                            \
feature:       D─E           feature:       D'─E'

The result is a linear history — easier to read, easier to bisect. The tradeoff: rebase rewrites commit hashes (since the parent changes), so it shouldn’t be used on shared branches. The merge-vs-rebase debate is one of the longest-running arguments in software development.

#GitHub: Git as Social Network

Git was designed for the Linux kernel’s email-based patch workflow. It was powerful and hostile to newcomers. The commands were cryptic, the documentation was famously unhelpful, and the error messages were written by and for Linus Torvalds.

In 2008, Tom Preston-Werner, Chris Wancrath, and PJ Hyett launched GitHub — a web platform that wrapped Git in a social interface. GitHub didn’t change Git. It added:

  • Pull requests — a structured workflow for proposing, reviewing, and merging changes
  • Issues — lightweight bug and feature tracking linked to code
  • Forks — one-click copies of any public repository, enabling drive-by contributions
  • Social features — profiles, stars, followers, activity feeds

GitHub made open-source contribution accessible. Before GitHub, contributing to an open-source project meant subscribing to a mailing list, learning the project’s patch format, and hoping a maintainer would notice. With GitHub, you could fork, change, and submit a pull request in minutes.

By 2024, GitHub hosted over 400 million repositories. The pull request became the universal unit of software change — not just for open source, but for teams of every size. Code review, CI/CD checks, automated testing — the entire modern development pipeline is triggered by a pull request.

#What Git Got Right

Git was designed for one project (the Linux kernel) by one person (Torvalds) in two weeks. It became the version control system for all of software:

  • Content-addressed storage — hashing everything made integrity automatic, deduplication free, and the object model elegant. It also means Git can verify the entire history of a project: if any bit was changed, the hash chain breaks. This is the same principle behind blockchain — and Git did it first.
  • Cheap branches — by making branches nearly free, Git changed how teams work. Feature branches, pull requests, and the fork-and-merge model are all consequences of branches being a 41-byte pointer instead of a full directory copy. The workflow innovation mattered more than the technology.
  • Speed as a feature — Torvalds’s insistence on millisecond operations wasn’t just a performance goal. It changed developer behavior. When branching, committing, and merging are instant, developers do them constantly. When they’re slow (SVN), developers avoid them. Git’s speed made good practices cheap.
  • Distribution as resilience — every clone is a backup. Every fork is a potential continuation of the project. There’s no single point of failure, no single point of control. When a project’s maintainer disappears, anyone with a clone can continue. This property made Git the trust infrastructure for open-source software development.

Torvalds built Git to solve a specific problem: manage the Linux kernel’s source code after losing BitKeeper. He solved it so well that the tool subsumed the entire industry. Git isn’t just version control — it’s the substrate of modern software development. Every commit, every branch, every pull request, every CI pipeline runs on the infrastructure an angry Finn built in two weeks.