Parallel (de)compression

When working on a cluster system, one gets used to the idea of doing things in parallel, especially when that parallelism leads to my work getting done more quickly! While I'm *NOT* going to discuss compression and decompression on multiple machines (which /could/ be cool), I'm going to talk about multi-threaded implementations of some popular compression algorithms that we *NIX weenies have grown to love...

Parallel Gzip

Gzip is a very common compression utility in the *NIX world and is probably the most common compression format used for at least the last 15 years or so (History buffs may feel free to contact me with more complete information!). Gzip uses the DEFLATE compression algorithm specified in RFC 1951 and the GZIP file format specified in RFC 1952.

The parallel implementation of Gzip called "pigz" is available here. This implementation is done by the fine folks at zlib.org and from my tests is 100% compatable with the standard gzip binary on my system. In other words, I can compress files with gzip and decompress them with unpigz or compress them with pigz and decompress them with gzip.

On my system, pigz uses nearly 100% of all of the available CPUs on the system (assuming it is not I/O bound!).

Parallel XZ

XZ is a compression utility using the LZMA2 compression format.

The parallel implementation of xz called "pxz" is available here.

Parallel bzip2

Info about one implementation can be found here. This implementation requires cilk plus to compile.