bug#23113: alternatives: parallel gzip processes trash hard disks

John Reiser Sun, 03 Apr 2016 08:57:15 -0700

Here are some other approaches which may help:

1. Use gzopen() from zlib to compress the 10GB file as it is generated.
This uses only one CPU core and requires sequential writing only
(no random writes) but that may be enough in some cases.


2. The output from gzip is written 32KiB at at time, so a large output file
involves growing the file many times.  Thus buffering the output from gzip
into larger blocks may help, too.  Try:
        gzip ...  |  dd obs=... of=...

3. Similarly, dd can buffer the input to gzip:
        dd if=... ibs=... obs=...  |  gzip ...

4. dd can also be used to create multiple streams of input
from a single file:
        (dd if=file ibs=... skip=0*N count=N obs=...  |  gzip ... ) &
        (dd if=file ibs=... skip=1*N count=N obs=...  |  gzip ... ) &
        (dd if=file ibs=... skip=2*N count=N obs=...  |  gzip ... ) &
        (dd if=file ibs=... skip=3*N count=N obs=...  |  gzip ... ) &
However dd does not perform arithmetic, so the multiplication j*N
must be given as a literal result.

The dd utility program is quite versatile!

bug#23113: alternatives: parallel gzip processes trash hard disks

Reply via email to