"Jason M. Felice" writes: > This patch adds the --link-by-hash=DIR option, which hard links received > files in a link farm arranged by MD4 file hash. The result is that the system > will only store one copy of the unique contents of each file, regardless of > the file's name. > > (rev 2) > * This revision is actually against CVS HEAD (I didn't realize I was working > from a stale rsync'd CVS). > * Apply permissions after linking (permissions were lost if we already had > a copy of the file in the link farm).
I haven't studied your patch, but I have a couple of comments/questions: - If you update permissions, then all hardlinks will change too. Does that mean that all instances of an identical file will get the last mtime/permissions/ownership? Or does the link farm have unique entries for contents plus meta data (vs just contents)? - Some file systems have a hardlink limit of 32000. You will need to roll to a new file when that limit is exceeded (ie: link() fails). Also, empty files tend to be quite prevalent, so it is probably easier to just create those files and not link them (should be no difference in disk usage). - How does this patch interact with -H? Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html