On Tue, Jul 14, 2009 at 5:17 PM, Carlos Carvalho<car...@fisica.ufpr.br> wrote:
> Hash calculation is very fast; rsync has a negligible cpu consumption.

Hash calculation for the receiver is usually disk-bound, But rsync has
massive CPU consumption in certain cases. When using -Z on a fast
network. I have seen rsync become CPU-bound on a 100 Mbps WAN using a
3 Ghz Xeon 5400-series. Even without -Z, simply looking for hash
matches (and caluclating the strong checksums for weak matches) can be
very CPU intensive on the sending side. That is the whole point
really: rsync trades CPU for network bytes.

> What limits it is reading the disk. If you run a hash check you'll see
> the process stalled in io and not cpu. Maybe your machine has a
> particularly different IO/cpu ratio?
> This, and the fact that the maintainer(s?) want to keep rsync stateless,
> makes me think that a change to remember hashes is unlikely.

Yes, in this case the receiver *is* waiting on disk initially. The
fact that the I/O is completely pointless and takes 20-40 minutes of
wall-clock time is my issue. Why re-read 50 GB and re-calculate hashes
for it when the sender did it yesterday?

Storing a cache of hashes that are only used when a file is unchanged
would *not* change the user perception of a "stateless" of rsync, it
would simply be an optimization. Rsync uses temporary files for
partial data already.

That said, it's open source: I should just drop a patch bomb. My fear
is that it would take me forever and be rejected, as I haven't coded
in C since 1996. My employer can't sponsor development, so I figure I
would just rant in this forum.

-- 
RPM
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to