Guides / How Text Diff Works

How Text Diff Works

4 min read · Developer tools

What is a diff?

A diff (short for difference) is a representation of the changes between two versions of a text. It shows what was added, removed, or left unchanged — without repeating the parts that stayed the same.

Every time you run git diff, open a pull request, or review a code change, you are looking at a diff. The diff command-line tool has been part of Unix since 1974.

How to read a unified diff

The most common format is the unified diff, used by git:

--- a/app.js
+++ b/app.js
@@ -12,7 +12,8 @@
 function greet(name) {
-  return "Hello " + name;
+  const greeting = `Hello, ${name}!`;
+  return greeting;
 }

 module.exports = { greet };

--- and +++ — the two files being compared (before and after)

@@ -12,7 +12,8 @@ — the hunk header: old file starts at line 12 (7 lines shown), new file starts at line 12 (8 lines shown)

Lines starting with - — removed from the original

Lines starting with + — added in the new version

Lines starting with a space — unchanged context lines (usually 3 shown on each side)

How the algorithm works

Most diff tools use the Longest Common Subsequence (LCS) algorithm. Given two sequences of lines, the algorithm finds the longest sequence of lines that appears in both files in the same order — those lines are unchanged. Everything else is a deletion or an insertion.

For example, diffing these two lists:

Original

apple
banana
cherry
date

Modified

apple
blueberry
cherry
date
elderberry

The LCS is apple, cherry, date. The diff shows: banana removed, blueberry added, elderberry added.

Myers diff algorithm

The algorithm used by git (and most modern tools) is the Myers diff algorithm, published in 1986. It finds the shortest edit script — the minimum number of insertions and deletions needed to transform one file into the other.

Myers is fast enough for large files and produces compact, human-readable diffs. For very large files, git uses a variant called histogram diff which avoids common boilerplate lines (like empty lines or closing braces) appearing in the LCS, which would produce confusing diffs.

Diff in git

Common git diff commands:

# Changes in working directory (not staged)
git diff

# Changes that are staged (ready to commit)
git diff --staged

# Changes between two commits
git diff abc123 def456

# Changes between a branch and main
git diff main..feature/my-branch

# Word-level diff (highlights changed words, not lines)
git diff --word-diff

Three-way merge

When two people edit the same file simultaneously, git needs to merge their changes. It does this with a three-way diff: it compares both edited versions against the common ancestor (the last commit both branches share).

If two people edited different parts of the file, git can merge automatically. If they edited the same lines, git flags a merge conflict and asks you to resolve it manually:

<<<<<<< HEAD
  return "Hello " + name;
=======
  return `Hi, ${name}!`;
>>>>>>> feature/greeting

Compare text in your browser

Paste two versions of any text and see the differences highlighted side by side.

Text Diff Tool →