On Wed, Jul 15, 2009 at 12:54 PM, Jamie Lokier<ja...@shareable.org> wrote: > It still has to send the hashes, which can be slow for a large file. > So it would be even better to cache on the sending side hashes of > files on the receiving side, perhaps indexed by the receiving side's > MD5 of the whole file.
The hashes for a 16 GB file using the default block size is about 28 bytes / 128Kbytes. Or 0.02% of the file size, which works out to around 3.5 MB. This is peanuts in the grand scheme of things when dealing with large files, so I suppose whichever hash storage location made the implementation easier or more robust should be used. If hashes were cached on the receiver, no protocol changes would be necessary, I think. The hash list would just arrive back at the sender without any delay. > There are two meanings of "stateless": > > 1. It compares files on the sender and receiver, does not keep a > list of what it sent before, so always works even if files on > the receiver have been changed without using rsync. > > 2. It does not keep auxiliary data such as precomputed hashes to > optimize the "stateless" update operation. > > Perhaps the rsync maintainers meant 1, and you thought they meant 2? > I'm not sure what is truly meant by stateless in this context. "Rsync is stateless" does seem to be an often-repeated mantra, though: http://www.google.com/search?q=rsync+stateless+site:lists.samba.org Unison is often suggested as an alternative, but it really doesn't handle large files well, and doesn't have --fuzzy. It's also written in Ocaml, making it even less likely that someone can fix those issues now that the creators have moved on. -- RPM -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html