How version control systems store revisions

2025年1月28日


Subversion

Subversion (svn) uses a technique called “skip-deltas”. In the FSFS back end, each revision of a file is represented as a delta against an older revision of the file. To get a file of revision N, subersion does not necessarily retrieve delta from N-1. One revision can be the base of many newer revisions. The algorithm of subversion achieves the lg(N) performance to reconstruct revision N of a file.

Dispite of the lowered complexity, retriving the lastest file state is always the slowest operation. Hence, subversion also invented “reverse-deltas”, which does the opposite. Once a revision is committed, all previous revisions are computed against the lastest revision. In this way, retriving the lastest file state is always the fastest operation and retriving revision 1 is the slowest.

Due to the reverse-deltas and the dichotomy delta bases, the deltas used to construct files are rarely the same as the diff output to compare differences for successive revisions.

Proforce Helix Core

Helix Core, formerly Perforce Helix, is a proprietary version control system. It was initially released in 1995 by a graduate student fom UC Berkeley.

Same as Subversion, Proforce is a centralized version control system. Proforce has its stand in game development[1] because a single Proforce repository can hold program code and art artifacts, and Proforce is easier for non-programmers (artists). In my opinion, code of game engine should be separated from game assets and even plot scripts and in-game texts, but that’s a different topic.

By default, text files are stored in reverse delta format, and binary files are stored in their entirety.[2]

Git

See the details in “How git stores revisions“. In a nutshell, git genernally compresses the full text of individual files. Even though deltas are used in its internal packfiles, the deltas cannot be used for producing diffs when users want to see the content of a commit.

Mercurial

Mercurial is a brother of Git. When BitKeeper stopped offering its free version to the Linux community, two version control systems were invented, one is Git, the other is Mercurial.

Mercurial eiter computes delta of files in a revision or compresses the full text of a file as a base.[3][4]

Mercurial does not encourage rewriting history.[5]

Conclusion

Even though diff and patch can create deltas and generate a new file, version control systems (vcs) do not use diff and patch in their storage models. VCSs have far more considerations depending on their preference for speed or storage, whether the history is immutable, and so on.

Even if a delta between two revisions exists when users want to compare the two revisions, there are an array of diffing algorithms, such as greedy, minimal, patience, histogram.[6] Hence, the delta may not be in the shape that a user wants. Transforming, say, a greedy diff to a histogram diff is a totally different topic.

Storing delta tends to use less space, but it’s slow in retrieving revisions. Under this scheme, ther are ways such as skipping-delta or full-text base to speed up retrieval.

Immutable history allows files and deltas to be indexed more efficiently as Mercurial does.

References

  1. Matthäus Niedoba. Git vs Perforce for Unreal and Unity projects. . 2024-03-01 [2025-01-28].
  2. . Basic Concepts. . [2025-01-29].
  3. . How does Mercurial store its data?. . [2025-01-28].
  4. . Behind the scenes. . [2025-01-28].
  5. . Histedit Extension. . [2025-01-28].
  6. . git-diff - Show changes between commits, commit and working tree, etc. . [2025-01-29].