Hi parallel users,
Background
EXAMPLE: Processing a big file using more CPUs
To process a big file or some output you can use --pipe to split up the
data into blocks and pipe the blocks into the processing program.
If the program is gzip -9 you can do:
cat bigfile | parallel --pipe --recend '' -k gzip -9 > bigfile.gz
This will split bigfile into blocks of 1 MB and pass that to gzip -9 in
parallel. One gzip will be run per CPU. The output of gzip -9 will be
kept in order and saved to bigfile.gz
Question
I would like to create blocks of suitable size for each cpu/thread for
binary files, like it is possible with --pipepart --block -1 for text
files (with lines).
I have tried but can only get 1 MiB size block (default).
The reason why I want this is that I often create compressed images of
the content of a drive, /dev/sdx, and I lose approximately half the
comptression improvement from gzip to xz, when using parallel. The
improvement in speed is good, 2.5 times, but I think larger blocks would
give xz a chance to get a compression much closer to what it can get
without parallel.
Is it possible with with the current code? In that case how?
Otherwise I think it would be a good idea to modify the code to make it
possible.
Best regards
Nio