On Tue, Feb 10, 2004 at 10:11:09AM +1100, Donovan Baarda wrote: > On Tue, 2004-02-10 at 07:48, Jason M. Felice wrote: > > This patch adds the --link-by-hash=DIR option, which hard links received > > files in a link farm arranged by MD4 file hash. The result is that the system > > will only store one copy of the unique contents of each file, regardless of > > the file's name. > > Does this mean it also automatically detects renames?
No. It can't detect whether two files have identical contents until after the file has been transferred. This patch can only save disk space, not bandwidth. > > > Anyone have an example of an MD4 collision so I can test that case? :) > > How do you recover from that case? Files in the link farm are arranged like so: <DIR>/<hash-first-8>/<hash-last-24>/<n> <DIR> is the parameter supplied to the --link-by-hash=DIR <hash-first-8> is the first eight hex digits of the file's MD4 sum. <hash-last-24> is the last 24 digits of the file's MD4 sum. <n> is an integer, starting from 0. Theoretically, if two files have the same MD4 hash, they will be assigned consecutive numbers for <n>. Oh, and to raise the bar, the sample MD4 collision files need to be the same length :) I did a little research and there were claims that MD4 had been "cracked"--I assume this means that there is some way other than brute force to find a file which collides with a given example. I can't seem to find any examples. > > > Patch Summary: > > > > -1 +1 Makefile.in > > -0 +304 hashlink.c (new) > > -1 +21 options.c > > -0 +6 proto.h > > -5 +21 receiver.c > > -0 +6 rsync.c > > -0 +7 rsync.h > > If this does everything I think it does, then it's a surprisingly small > amount of changes for what it does. It seems to be big enough to do what it does. :) -- Jason M. Felice Cronosys, LLC <http://www.cronosys.com/> 216.221.4600 x302 -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html