bug#30719: Progressively compressing piped input

Garreau, Alexandre Tue, 06 Mar 2018 14:08:37 -0800

Le 05/03/2018 à 14h54, Mark Adler a écrit :
> deflate has an inherent latency that accumulates enough data in order
> to efficiently emit each deflate block. You can deliberately flush
> (with zlib, not gzip), but if you do that too frequently, e.g. each
> line, then you will get lousy compression or even expansion.


Even if the main repetition is being between the lines? like if 80% of
half the line, and 70% of the other half lines are the same? like in a
while loop with only ping and date? I thought to it as a very lazy way
of not having to remove all the redundant output caused by the usage of
ascii, the repetition of words or similar patterns occuring ever and
ever.

> I wrote something called gzlog
> (https://github.com/madler/zlib/blob/master/examples/gzlog.h
> <https://github.com/madler/zlib/blob/master/examples/gzlog.h>),
> intended to solve this problem. It can take a small amount of input,
> e.g. a line, and update the output gzip file to be complete and valid
> after each line, yet also get good compression in the long run. It
> does this by writing the lines to the log.gz file effectively
> uncompressed (deflate has a “stored” block type), until it has
> accumulated, say, 1 MB of data. Then it goes back and compresses that
> uncompressed 1 MB, again always leaving the gzip file in a valid
> state. gzlog also maintains something like a journal, which allows
> gzlog to repair the gzip file if the last operation was interrupted,
> e.g. by a power failure.

I rather searched some tool that could be used as an utility (since
that’s for a dirty high-level low-frequency medium-term task) rather
than a C thing, yet that’s quite interesting at least in demonstrating
the flexibility of gzip…

>> #!/bin/bash
>> while ping -c1 gnu.org ; do
>>    date --rfc-3339=seconds
>>    sleep 30
>> done | gzip -9 -f | tee sample.log | zcat

maybe the only way to go is just gzipping everything each time a log is
rotated like the standard way, if that pipe thing cannot be done even
with each line being almost the same…

bug#30719: Progressively compressing piped input

Reply via email to