[ CC ++ [EMAIL PROTECTED] ]

On Tue, Nov 11, 2008 at 2:58 PM, Andrew McGill <[EMAIL PROTECTED]> wrote:
> What would you expect this to do --:
>
>     find -type f -print0 |
>         xargs -0 -n 8 --max-procs=16 md5sum >& ~/md5sums

Produce a race condition :)    It generates 16 parallel processes,
each writing to the md5sums file.  Unfortunately sometimes the writes
occur at the same offset in the output file. To illustrate:

~$ strace -f -e open,fork,execve sh -c "echo hello > foo"
execve("/bin/sh", ["sh", "-c", "echo hello > foo"], [/* 39 vars */]) = 0
[...]
open("foo", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
~$ strace -f -e open,fork,execve sh -c "echo hello >> foo"
execve("/bin/sh", ["sh", "-c", "echo hello >> foo"], [/* 39 vars */]) = 0
[...]
open("foo", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3

This version should be race-free:

find -type f -print0 |
     xargs -0 -n 8 --max-procs=16 md5sum >> ~/md5sums 2>&1

I think that writing into a pipe should be OK, since pipes are
non-seekable.  However, with pipes in this situation you still have a
problem if processes try to write more than PIPE_BUF bytes.


> Is there a correct way to do md5sums in parallel without having a shared
> output buffer which eats output (I presume) -- or is losing output when
> haphazardly combining output streams actually strange and unusual?

I hope the solution about solved your problem - and please follow up
if so.  This example is probably worthy of being mentioned in the
xargs documentation, too.

Thanks for your comment!

James.


_______________________________________________
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Reply via email to