Hello,
some time ago I reported a bug, where we saw indeterministic behaviour of rsync (all versions since 2.5), when having the same file appearing in multiple sources. Sometimes the file in the first source was copied, other times the file was copied from one of the other sources.
The attached mstest.tgz contains a test to reproduce the behaviour under darwin and solaris.
The bug did *not* show up in gnu linux versions of rsync, which will be explained below:
rsync uses the "qsort" system call to compose the entire file list from all files of all sources. qsort is known to be unstable, meaning that is does not guarantee the former order, if items to sort have the same value. Our test case triggers a situation where this unstabilibity shows up.
Why does it not happen in gnu linux versions?
Reading man pages showed us that glibc has an "optimization" in qsort: if memory is not low it uses mergesort instead, which is a stable sort algorithm.
fix:
Since in our scenario using rsync we rely on deterministic behaviour, we patched rsync to use mergesort always for composing the file list. For systems without a mergesort system call (most os's except freebsd/darwin) we use the freebsd implementation of mergesort and put it in the source tree of rsync. patches (relative to 2.6.2) and source are attached.
I want to share this with the public and propose to change rsync to use mergesort instead of qsort. if this is not mainstream since mergesort has worse memory complexity, I propose to give users a command line switch to decide, whether they want to use the feature (prefer reliability for some scenario over performance) or not.
Hope this will be heared.
Thanks, Dirk.
mstest.tgz
Description: GNU Zip compressed data
patches.tgz
Description: GNU Zip compressed data-- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html