On Tue, Jul 14, 2009 at 5:17 PM, Carlos Carvalho<car...@fisica.ufpr.br> wrote: > Hash calculation is very fast; rsync has a negligible cpu consumption.
Hash calculation for the receiver is usually disk-bound, But rsync has massive CPU consumption in certain cases. When using -Z on a fast network. I have seen rsync become CPU-bound on a 100 Mbps WAN using a 3 Ghz Xeon 5400-series. Even without -Z, simply looking for hash matches (and caluclating the strong checksums for weak matches) can be very CPU intensive on the sending side. That is the whole point really: rsync trades CPU for network bytes. > What limits it is reading the disk. If you run a hash check you'll see > the process stalled in io and not cpu. Maybe your machine has a > particularly different IO/cpu ratio? > This, and the fact that the maintainer(s?) want to keep rsync stateless, > makes me think that a change to remember hashes is unlikely. Yes, in this case the receiver *is* waiting on disk initially. The fact that the I/O is completely pointless and takes 20-40 minutes of wall-clock time is my issue. Why re-read 50 GB and re-calculate hashes for it when the sender did it yesterday? Storing a cache of hashes that are only used when a file is unchanged would *not* change the user perception of a "stateless" of rsync, it would simply be an optimization. Rsync uses temporary files for partial data already. That said, it's open source: I should just drop a patch bomb. My fear is that it would take me forever and be rejected, as I haven't coded in C since 1996. My employer can't sponsor development, so I figure I would just rant in this forum. -- RPM -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html