On Mon, Jul 13, 2009 at 4:54 PM, Jamie Lokier<ja...@shareable.org> wrote: > > Remembering hashes doesn't make any difference to speed, if the > bottleneck is the sending side.
Except that in the rsync pipeline, the reading the destination file to get hashes happens BEFORE the sender reads its file. And the sender calculates hashes and finds matches on-the-fly. So, when transferring a large file, it goes something like this from the sender's perspective: 1) sending file list 2) receving file list 3) file xxxx is different! Recevier, please give me some hashes 4) <wait 20+ minutes for receiver to compute hashes> got hashes 5) begin transfer, calculating my hashes and compressing on the fly as I transfer 6) file complete By caching hashes on the receiving side, the transfer can begin almost instantaneously if the file on the receiver is unchanged since the last run of rsync. This is, in fact, almost always true for the way most people use rsync (backups, file distribution, etc.) Most of my rsync scripts "stall" for minutes doing no effective work, because they are waiting for the destination to read and calculate hashes of a large file that was already hashed yesterday. Incidentally, hashes could also be "remembered" on the sending side as well, and sent to the receiver. You would of course fail back to the current behavior if the file had changed on both ends, or if somehow a whole-file checksum failed. -- RPM -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html