How git stores revisions
2025年1月29日
In this article, I will inspect the internals of git by experiments. I will create repositories, make commits, and use git plumbing commands to reveal the internal structure of git storage.
Loose Objects
In Listing , I commit a file into a Git repository. The SHA of the commit is 25831cb. A git commit is a subclass of a git object, and I can verify it with the cat-file -t
command.
Next, in Listing I dig into the commit object of d98140. By pretty-printing it, I know d98140 contains a reference to a tree object with SHA 8037ccf. Further, 8037ccf references a blob with SHA 0ffd544.
So far, we get acquainted with 3 git objects and they incidentally are stored in the .git folder, see Listing .
Now let’s focus on the blob object revealed in Listing , which is supposed to store the content of the file. Opening the file with a text editor gives me garbled text shown in Figure because git compressed it with the zlib algorithm known as the zip format. I so happen to have the openssl tool on my system which comes with the zlib library. Hence I decompress the file, and get back the original text, see Listing . “blob” is the type of the object. 2126 is the length, which I assume includes a newline character at the end of the file. And next is the actual content of the file. For the performance of zlib, please see “Performances of Compression Algorithms“.
Now I make a new commit by changing 1824 to 1825, with a difference of one letter. From Listing , we know the SHA of the new commit is 0fb7d75. It references a tree object with SHA 098cad8, which further references a blob with SHA ca2657b. Decompress the new object, we get back the full content of the file.
Packfiles
Git runs garbage collection and maintaince at certain intervals. It will pack loose objects into packfiles which is zipped too. A packfile is like a virtual file system. Whether or not the file system utilizes deltas is subtle details.[1] Files with similar content may be arranged together and apply deltas to construct the second file. Notice the two files are not necessarily from consecutive commits[2]. The file arrangement is totally up to the packfile subsytem.
Conclusion
This experiment tells us that git compresses the full text of individual files, and does not utilize deltas for loose objects. For packfiles, deltas may be used but it's extremely low level, and the deltas are not diffs for users when they request to compare any two revisions or neighbor revisions.
Even if a delta between two revisions exists and users chance to compare the two revisions, there is an array of diffing algorithms, such as greedy, minimal, patience, histogram.[3][4] Hence, the delta may not be in the shape that a user wants. Transforming, say, a greedy diff to a histogram diff is a totally different topic.
References
- Lasse V. Karlsen. Git internals: how does Git store small differences between revisions?. . 2017-04-12 [2025-01-29].↑
- Greg Hewgill. Git internals: how does Git store small differences between revisions?. . 2017-04-12 [2025-01-29].↑
- . git-diff - Show changes between commits, commit and working tree, etc. . [2025-01-29].↑
- Yusuf Sulistyo Nugroho; . How different are different diff algorithms in Git?. Empirical Software Engineering . 2019, (): [2025-01-30].↑
[…] the details in “How git stores revisions“. In a nutshell, git compresses the full text of individual files, and does not utilize any […]