On Sat, Jan 13, 2007 at 10:07:59PM -0800, Paul Eggert wrote: > 3. I can see where the user might be able to specify a better > algorithm, for a particular data set. For that, how about if we have > a --compress-program=PROGRAM option, which lets the user plug in any > program that works as a pipeline? E.g., --compress-program=gzip would > use gzip. The default would be to use "PROGRAM -d" to decompress; we > could have another option if that doesn't suffice. > > An advantage of (3) is that it should work well on two-processor > hosts, since compression can be done in one CPU while sorting is done > on another. (Hmm, perhaps we should consider forking even if we use a > built-in default compressor, for the same reason.)
I've started working on this, and have made good progress so far. There are a lot of subtleties, though, like making sure the forked child doesn't receive SIGINT and unlink all our temp files before it execs (I've solved that problem), and making sure the compress process finishes compressing the temp file before the corresponding decompress process starts processing it (I've got a plan for that). Anyway, my point is, I've gotten off to a good start, but it's going to take a lot of testing to make sure I've done it right due to all these race conditions. The actual compression is obviously a lot better (using gzip / bzip2), and it shouldn't be hard to extend the code so sort can read and write externally compressed files, which is what the OP wanted. It's not faster (not even close) on my machine, though. Of course, I've only got one CPU, and a slow one at that :-) Dan _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils