Comparing ZIP compression methods

Preamble

This is a comparison between compression methods (mainly 7zip and unix tools).
This is not an analysis about the different compression methods or how they compare in their algorithm against each other.
This is a simple comparison of the compression size of a small set of data.

LZMA2 is worse if you use multiple threads. It is recommended to use 1 or max. 2 threads for compression, because if you have more than 2 it splits the whole archive in 2 and makes the compression of similar content 2 times instead of once. (This information can be found in the 7zip documentation)

Comparison with Images

This test was conducted with 35 images that are part of a cosplay shooting from Megumi Koneko. A sample picture can be seen above.

Warning: These images are in .jpg format! Results with png images could be different!

All compressions have been done with the "Compression Level" set to "Ultra". (And as mentioned above with only 1 Thread)

The size of these 35 images without any compression: 267198693 (du -sb) / in a readable size: 255MB

Size(bytes)ProgramMethodLevelDictionarySizeWordSizeSolidBlockSize
2611572657zipPPMdUltra192mb32solid
2623543207zipbzip2Ultra900kbX128m
2630874287ziplzma2Ultra64mb64solid
2640657887ziplzmaUltra64mb64solid
264535465Debian10 zipDeflationNormalXXX

As you can see the best algorithm was 7zip's PPMd method.
An important detail here: It took way longer to compress it with PPMd than with a default linux zip command. The difference between the first and last is only ~3.4MB, but the time it took (with only 1 thread) was way bigger for the PPMd method than for a simple zip.

To summarize this: There is no clear winner here.

Comparison with Text

This test was conducted with 4677 text files in 1670 folders. The size of the files were 24.8MB (Size on disk was 34.2MB).

All 7zip compressions have been done with the "Compression Level" set to "Ultra". (And as mentioned above with only 1 Thread)

Size(bytes)Size(MB)ProgramMethodLevelDictionarySizeWordSizeSolidBlockSize
26778422.6M7zipPPMdUltra192mb32solid
27496142.7M7zipPPMdUltra64mb32solid
28342202.8M7ziplzma2Ultra1024mb256solid
28433282.8M7ziplzma2Ultra1024mb128solid
28605992.8M7ziplzmaUltra64mb64solid
28610992.8M7ziplzma2Ultra1024mb64solid
28610992.8M7ziplzma2Ultra64mb64solid
29345392.8M7ziplzma2Ultra192mb32solid
34061293.3Mtar(debian)bzip2UltraXXX
34099193.3M7zipbzip2Ultra900kbXsolid
42479564.1Mtar(debian)gzipUltraXXX
84828938.1Mzip(7zip)deflateUltra32kb128X
84879338.1Mzip(7zip)deflateUltra32kb32X
85811958.2Mzip(7zip)deflateNormal192mb32X
87669218.4Mzip(debian)deflateUltra(9)XXX
87865348.4Mzip(debian)deflateNormal(6)XXX

Every text file had the same "Heading" text, like a logo made with text characters. This explains the bigger size of the debian zip compression as the small dictionary size is a burdgen for this heading that is present in every file.

There are 2 main sections: The 7zip modern methods on the top that are all 2.8MB or lower and the default zip methods at the bottom with 8.1MB or higher.

The default method for 7zip is the LZMA2 method, and this comparison shows why: It's perfect for text compression. If you raise the word size a bit it's nearly as good as the PPMd method.

To summarize this: The winners are the new 7zip methods PPMd, lzma and lzma2