Performances of Compression Algorithms
2025年1月29日I found a CatchChallenger benchmark performed in year 2012 to 2016[1], where gzip, bzip2, lzma, the extreme option of lzma[2], xz, the extreme option of xz, lz4, lzop were evaluated.
- gzip uses the zlib library which implements the Deflate algorithm.
- lzma is the core algorithm in 7zip.
- xz is just a different implementation of lzma, so it should be on par with lzma.
I set gzip level 5 as baseline, and normalized all values relative to the baseline. In terms of compression ratio, the level 1 of bzip, lzma, lzma -e, xz, and xz -e are better than level 9 of gzip. On the other hand, bzip, lzma, lzma -e, xz, and xz -e are slower than slowest gzip for the compression ratio they gain. The rest two algorithms, lz4 and lzop, favor speed over file size.
In order to combine compression ratio and compression time, I multiply them together. If the weight of ratio and time is 1:1, as shown in the first table, only lz4 and lzop are better than the baseline (level 5 of gzip with value of 1). In addition, the three levels of lz4 doesn’t make any differences. We can simply choose level 1. If you have to stick with lzop, choose level 3 or 5 because their weighted performances are better than level 1 and 2.
If a compressed file is going to be stored for a long time, the comression ratio should gain more weight. If the weight is 4:1, no comfirguration is better than gzip -5.
If the comression ratio weights even more heavily, as shown in the 3rd table, lzma and xz exhibit marginal advantage over gzip.
To conclude, lz4 is a nice choice for transient data or data sent over network. Normally, level 5 of gzip is optimal both for file size and compression time. As a side note, Git is using the gzip algorithm to compress loosely objects and packs. For long term storage, you may consider lzma and xz, i.e. 7zip, but they don’t make significant differences to gzip. Nevertheless, if the archive is for sharing, focus on the compatibility and popularity[3]. Do not use a compression format that a layman does not recognize.
“速度与压缩比如何兼得?压缩算法在构建部署中的优化” compares the performance gzip, Brotli, Zstd, LZ4, Parallel gzip (pigz), ISA-L, Parallel Zstd (Pzstd). Their test data are 1 gigbytes, consisting of program source code and artifects. Their benchmark machine has 764 cores. Therefore, all parallel algorithms achieve amazing performances. What’s more, Zstd, which claims to be a competitor of lzma, and lz4 are better than gzip in both time and ratio. In the CatchChallenger benchmark, no algirithm beats gzip in both dimensions.
See also “计算化学研究者应当了解的文件压缩的常识”
References
- Alpha one x86. Quick Benchmark: Gzip vs Bzip2 vs LZMA vs XZ vs LZ4 vs LZO. . 2016-10-09 [2025-01-29].↑
- . lzma(1) - Linux man page. . [2025-01-29].↑
- WMW. 压缩算法、压缩格式及压缩软件的详解. . 2024-04-13 [2025-01-29].↑
[…] Now let’s focus on the blob object revealed in Listing , which is supposed to store the content of the file. Opening the file with a text editor gives me garbled text shown in Figure because git compressed it with the zlib algorithm known as the zip format. I so happen to have the openssl tool on my system which comes with the zlib library. Hence I decompress the file, and get back the original text, see Listing . “blob” is the type of the object. 2126 is the length, which I assume includes a newline character at the end of the file. And next is the actual content of the file. For the performance of zlib, please see “Compression algorithms“. […]