Holimetrix Tech Blog

Twitter


How we compress our logfiles

Cyril BeaufrereCyril Beaufrere

At Holimetrix, we deal with millions of log lines everyday. Archiving and compressing these logs was a challenge for us, so we ran a full benchmark of all solutions available on the market.

Our strategy is pretty simple : make the most of some machine's CPU at night, when they're not too busy, to achieve the maximum compression and optimize our storage capacity.

We selected four challengers for their usability within scripts and robustness:

In order to choose which one we were going to use, we proceeded very pragmatically to test each of these tools.

We ran the tests on this reference server :

We used a small 1.3GB log file to test the compression algorithms, each with the same approach:

dario@mirkwood:~$ ll access.log
-rw-r--r-- 1 dario dario 1356661100 Jan 30 15:50 access.log

benchmark 1

No surprise, bzip2, when used with compression options set at its maximum, takes much more longer to compress our file than its competitors, same applies for decompression. This is not a serious issue for the servers we use as we have a good part of the night to carry out these operations.

Although LZ4 is a pretty recent technology, it beats Gzip on its own grounds with incredible performance in compression and decompression.

We will now test the most important aspect for us, the compression ratio.

Imgur

It appears that bzip2 and xz are pretty much neck and neck. Gzip and lz4 are definitely out due to their insufficient compression ratio.
We logically chose to use xz during our night batch processing for being a little bit faster than bzip2.

When taking a closer look at the xz options, we discovered the "extreme" mode:

-e, --extreme   try to improve compression ratio by using more CPU time;
                does not affect decompressor memory requirements

We ran a small test to compare performance with and without extreme mode.

Imgur

As you can see, with extreme mode on, compression time is almost multiplied by 6. But compression ratio also improves dramatically: over 98% compression ratio, our initial 1.3 GB file drops to 23 MB!

Imgur

Despite this amazing performance, we decided to use xz without extreme mode to preserve reasonable compression and decompression speed.
We'll surely keep an eye on lz4 which is quite impressive for such a young product and could soon become an interesting alternative to xz.

Cool Links :

Fan of all that relates to high availability, Linux and the web in general. Eater of books and series, I love to play a little guitar and do some photography.

Comments