How git stores revisions

2025年1月29日


In this article, I will inspect the internals of git by experiments. I will create repositories, make commits, and use git plumbing commands to reveal the internal structure of git storage.

Loose Objects

$ echo "Lewis W. Green..." > 1.txt
$ git add 1.txt
$ git commit -m "add"
$ git log --name-status
commit 25831cbfddcf825c10da36831665186536ddab8a (HEAD -> master)
Author: gqqnbig <gqqnbig@gqqnbig.me>
Date:   Tue Jan 28 11:45:59 2025 -0700

    add

A       1.txt
$ git cat-file -t 25831cbfddcf825c10da36831665186536ddab8a
commit
: Add a file into a Git repository

In Listing , I commit a file into a Git repository. The SHA of the commit is 25831cb. A git commit is a subclass of a git object, and I can verify it with the cat-file -t command.


$ git cat-file -p 25831cbfddcf825c10da36831665186536ddab8a
tree 8037ccfdc68acd90e00d9c6f0c31bd7827d6a59a
author gqqnbig <gqqnbig@gqqnbig.me> 1738061159 -0700
committer gqqnbig <gqqnbig@gqqnbig.me> 1738061159 -0700

add

$ git cat-file -p 8037ccfdc68acd90e00d9c6f0c31bd7827d6a59a
100644 blob  0ffd5446fe69ee8c68bb529fbe21e218da190d5b 1.txt
: Inspect the internal of git objects

Next, in Listing I dig into the commit object of d98140. By pretty-printing it, I know d98140 contains a reference to a tree object with SHA 8037ccf. Further, 8037ccf references a blob with SHA 0ffd544.

So far, we get acquainted with 3 git objects and they incidentally are stored in the .git folder, see Listing .

$ ls -l -R .git/objects/
$ ls -l -R .git/objects/
.git/objects/:
total 0
drwxr-xr-x 1 gqqnbig 197121 0 Jan 28 12:48 0f/
drwxr-xr-x 1 gqqnbig 197121 0 Jan 28 12:48 25/
drwxr-xr-x 1 gqqnbig 197121 0 Jan 28 12:48 80/
drwxr-xr-x 1 gqqnbig 197121 0 Jan 28 12:48 info/
drwxr-xr-x 1 gqqnbig 197121 0 Jan 28 12:48 pack/

.git/objects/0f:
total 4
-r--r--r-- 1 gqqnbig 197121 807 Jan 28 12:48 fd5446fe69ee8c68bb529fbe21e218da190d5b

.git/objects/25:
total 1
-r--r--r-- 1 gqqnbig 197121 118 Jan 28 12:48 831cbfddcf825c10da36831665186536ddab8a

.git/objects/80:
total 1
-r--r--r-- 1 gqqnbig 197121 50 Jan 28 12:48 37ccfdc68acd90e00d9c6f0c31bd7827d6a59a
: The sturcture of the .git folder

Now let’s focus on the blob object revealed in Listing , which is supposed to store the content of the file. Opening the file with a text editor gives me garbled text shown in Figure because git compressed it with the zlib algorithm known as the zip format. I so happen to have the openssl tool on my system which comes with the zlib library. Hence I decompress the file, and get back the original text, see Listing . “blob” is the type of the object. 2126 is the length, which I assume includes a newline character at the end of the file. And next is the actual content of the file. For the performance of zlib, please see “Performances of Compression Algorithms“.

: We can’t open the blob file with a text editor
$ openssl zlib -d < /b/git-test/.git/objects/0f/fd5446fe69ee8c68bb529fbe21e218da190d5b
blob 2126Lewis W. Green (January 28, 1806   May 26, 1863) was an American Presbyterian minister, 
educator, and academic administrator.
Born in Danville, Kentucky, and educated in Woodford County,
he enrolled at Transylvania University but transferred to Centre College to complete his degree.
He graduated in 1824 as one of two members of Centre's first graduating class...
: Decompressing the file gives the same content as git cat-file -p

Now I make a new commit by changing 1824 to 1825, with a difference of one letter. From Listing , we know the SHA of the new commit is 0fb7d75. It references a tree object with SHA 098cad8, which further references a blob with SHA ca2657b. Decompress the new object, we get back the full content of the file.

$ git cat-file -p 0fb7d75560b9b5baa866f1f38cab73a708381841
tree 098cad8c26f32523717817f72f1c5a61f32fcf7a
parent 25831cbfddcf825c10da36831665186536ddab8a
author gqqnbig <gqqnbig@gqqnbig.me> 1738071078 -0700
committer gqqnbig <gqqnbig@gqqnbig.me> 1738071078 -0700

fix year

$ git cat-file -p 098cad8c26f32523717817f72f1c5a61f32fcf7a
100644 blob ca2657b2a6412a3082d1bc8efcdf44a9d7c955bb    1.txt

$ openssl zlib -d < /b/git-test/.git/objects/ca/2657b2a6412a3082d1bc8efcdf44a9d7c955bb
blob 2126Lewis W. Green (January 28, 1806   May 26, 1863) was an American Presbyterian minister, 
educator, and academic administrator.
Born in Danville, Kentucky, and educated in Woodford County,
he enrolled at Transylvania University but transferred to Centre College to complete his degree.
He graduated in 1825 as one of two members of Centre's first graduating class...
: Change a letter in the file, and commit it

Packfiles

Git runs garbage collection and maintaince at certain intervals. It will pack loose objects into packfiles which is zipped too. A packfile is like a virtual file system. Whether or not the file system utilizes deltas is subtle details.[1] Files with similar content may be arranged together and apply deltas to construct the second file. Notice the two files are not necessarily from consecutive commits[2]. The file arrangement is totally up to the packfile subsytem.

Conclusion

This experiment tells us that git compresses the full text of individual files, and does not utilize deltas for loose objects. For packfiles, deltas may be used but it's extremely low level, and the deltas are not diffs for users when they request to compare any two revisions or neighbor revisions.

Even if a delta between two revisions exists and users chance to compare the two revisions, there is an array of diffing algorithms, such as greedy, minimal, patience, histogram.[3][4] Hence, the delta may not be in the shape that a user wants. Transforming, say, a greedy diff to a histogram diff is a totally different topic.

References

  1. Lasse V. Karlsen. Git internals: how does Git store small differences between revisions?. . 2017-04-12 [2025-01-29].
  2. Greg Hewgill. Git internals: how does Git store small differences between revisions?. . 2017-04-12 [2025-01-29].
  3. . git-diff - Show changes between commits, commit and working tree, etc. . [2025-01-29].
  4. Yusuf Sulistyo Nugroho; . How different are different diff algorithms in Git?. Empirical Software Engineering . 2019, (): [2025-01-30].