Ryan Malayter (malay...@gmail.com) wrote on 14 July 2009 17:00: >On Mon, Jul 13, 2009 at 4:54 PM, Jamie Lokier<ja...@shareable.org> wrote: >> >> Remembering hashes doesn't make any difference to speed, if the >> bottleneck is the sending side. > >Except that in the rsync pipeline, the reading the destination file to >get hashes happens BEFORE the sender reads its file. And the sender >calculates hashes and finds matches on-the-fly. > >So, when transferring a large file, it goes something like this from >the sender's perspective: >1) sending file list >2) receving file list >3) file xxxx is different! Recevier, please give me some hashes >4) <wait 20+ minutes for receiver to compute hashes> got hashes >5) begin transfer, calculating my hashes and compressing on the fly as >I transfer >6) file complete > >By caching hashes on the receiving side, the transfer can begin almost >instantaneously if the file on the receiver is unchanged since the >last run of rsync. This is, in fact, almost always true for the way >most people use rsync (backups, file distribution, etc.) > >Most of my rsync scripts "stall" for minutes doing no effective work, >because they are waiting for the destination to read and calculate >hashes of a large file that was already hashed yesterday.
Hash calculation is very fast; rsync has a negligible cpu consumption. What limits it is reading the disk. If you run a hash check you'll see the process stalled in io and not cpu. Maybe your machine has a particularly different IO/cpu ratio? This, and the fact that the maintainer(s?) want to keep rsync stateless, makes me think that a change to remember hashes is unlikely. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html