Hi list, I've found this post on rsync's expected performance for large files:
https://lists.samba.org/archive/rsync/2007-January/017033.html I have a related but different observation to share: with files in the multi-gigabyte-range, I've noticed that rsync's runtime also depends on how much the source/destination diverge, i.e., synchronization is faster if the files are similar. However, this is not just because less data must be transferred. For example, on an 8 GiB file with 10% updates, rsync takes 390 seconds. With 50% updates, it takes about 1400 seconds, and at 90% updates about 2400 seconds. My current explanation, and it would be awesome if someone more knowledgeable than me could confirm, is this: with very large files, we'd expect a certain level of false alarms, i.e., weak checksum matches, but strong checksum does not. However, with large files that are very similar, a weak match is much more likely to be confirmed with a matching strong checksum. Contrary, with large files that are very dissimilar a weak match is much less likely to be confirmed with a strong checksum, exactly because the files are very different from each other. rsync ends up computing lots of strong checksums, which do not result in a match. Is this a valid/reasonable explanation? Can someone else confirm this relationship between rsync's computational overhead and the file's (dis)similarity? Thanks, Thomas. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html