Le 05/03/2018 à 14h54, Mark Adler a écrit : > deflate has an inherent latency that accumulates enough data in order > to efficiently emit each deflate block. You can deliberately flush > (with zlib, not gzip), but if you do that too frequently, e.g. each > line, then you will get lousy compression or even expansion.
Even if the main repetition is being between the lines? like if 80% of half the line, and 70% of the other half lines are the same? like in a while loop with only ping and date? I thought to it as a very lazy way of not having to remove all the redundant output caused by the usage of ascii, the repetition of words or similar patterns occuring ever and ever. > I wrote something called gzlog > (https://github.com/madler/zlib/blob/master/examples/gzlog.h > <https://github.com/madler/zlib/blob/master/examples/gzlog.h>), > intended to solve this problem. It can take a small amount of input, > e.g. a line, and update the output gzip file to be complete and valid > after each line, yet also get good compression in the long run. It > does this by writing the lines to the log.gz file effectively > uncompressed (deflate has a “stored” block type), until it has > accumulated, say, 1 MB of data. Then it goes back and compresses that > uncompressed 1 MB, again always leaving the gzip file in a valid > state. gzlog also maintains something like a journal, which allows > gzlog to repair the gzip file if the last operation was interrupted, > e.g. by a power failure. I rather searched some tool that could be used as an utility (since that’s for a dirty high-level low-frequency medium-term task) rather than a C thing, yet that’s quite interesting at least in demonstrating the flexibility of gzip… >> #!/bin/bash >> while ping -c1 gnu.org ; do >> date --rfc-3339=seconds >> sleep 30 >> done | gzip -9 -f | tee sample.log | zcat maybe the only way to go is just gzipping everything each time a log is rotated like the standard way, if that pipe thing cannot be done even with each line being almost the same…