When working on a cluster system, one gets used to the idea of
doing things in parallel, especially when that parallelism leads
to my work getting done more quickly! While I'm *NOT*
going to discuss compression and decompression on multiple
machines (which /could/ be cool), I'm going to talk about
multi-threaded implementations of some popular compression
algorithms that we *NIX weenies have grown to love...
Gzip is a very common compression utility in the *NIX world and is
probably the most common compression format used for at least the
last 15 years or so (History buffs may feel free to contact me
with more complete information!). Gzip uses the DEFLATE
compression algorithm specified
in RFC 1951 and
the GZIP file format specified
in RFC 1952.
The parallel implementation of Gzip called "pigz" is available
here. This implementation is
done by the fine folks at zlib.org and from my tests is 100%
compatable with the standard gzip binary on my system. In other
words, I can compress files with gzip and decompress them with
unpigz or compress them with pigz and decompress them with gzip.
On my system, pigz uses nearly 100% of all of the available CPUs
on the system (assuming it is not I/O bound!).
XZ is a compression utility using the LZMA2 compression format.
The parallel implementation of xz called "pxz" is available here.
Info about one implementation can be
This implementation requires cilk plus to compile.