Re: feature request: gzip/bzip support for sort

Dan Hipschman Mon, 15 Jan 2007 12:33:16 -0800

On Sat, Jan 13, 2007 at 10:07:59PM -0800, Paul Eggert wrote:
> 3.  I can see where the user might be able to specify a better
> algorithm, for a particular data set.  For that, how about if we have
> a --compress-program=PROGRAM option, which lets the user plug in any
> program that works as a pipeline?  E.g., --compress-program=gzip would
> use gzip.  The default would be to use "PROGRAM -d" to decompress; we
> could have another option if that doesn't suffice.
> 
> An advantage of (3) is that it should work well on two-processor
> hosts, since compression can be done in one CPU while sorting is done
> on another.  (Hmm, perhaps we should consider forking even if we use a
> built-in default compressor, for the same reason.)


I've started working on this, and have made good progress so far.  There
are a lot of subtleties, though, like making sure the forked child
doesn't receive SIGINT and unlink all our temp files before it execs
(I've solved that problem), and making sure the compress process
finishes compressing the temp file before the corresponding decompress
process starts processing it (I've got a plan for that).  Anyway, my
point is, I've gotten off to a good start, but it's going to take a lot
of testing to make sure I've done it right due to all these race
conditions.

The actual compression is obviously a lot better (using gzip / bzip2),
and it shouldn't be hard to extend the code so sort can read and write
externally compressed files, which is what the OP wanted.  It's not
faster (not even close) on my machine, though.  Of course, I've only got
one CPU, and a slow one at that :-)

Dan



_______________________________________________
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Re: feature request: gzip/bzip support for sort

Reply via email to