devz...@web.de (devz...@web.de) wrote on 6 August 2009 20:15: >i`m using rsync to sync large virtual machine files from one esx server to >another. >the speed is "reasonable", but i guess it`s not the optimum - at least i >donŽt know where the bottleneck is.
That's vague and subjective, so difficult to answer. >i read that rsync would be not very efficient with ultra-large files (i`m >syncing files with up to 80gb size) The larger the file the longer it takes to locate matches and differences but you save more transfers. It also depends on the rsync version you're using, latest versions are better. >regarding the bottleneck: neither cpu, network or disk is at their limits - >neither on the source nor on the destination system. >i don`t see 100% cpu, i don`t see 100% network or 100% disk i/o usage There may be other factors interfering. To see how fast rsync is you need a comparison with another transfer program. To see the effect of the overhead you could use the -W option. devz...@web.de (devz...@web.de) wrote on 7 August 2009 18:44: >so, the question is: is rsync rolling checksum algorithm the perfect >(i.e. fastest) algorithm to match changed blocks at fixed locations >between source and destination files ? No. If you know where the differences are you can optimize for them. This would avoid discovering differences and matches, would avoid reading the identical portions in the sender and would avoid reading them twice in the destination. Updating in-place would even avoid reading them at all. If you know so much about the files, they're so big and the differences are so small why don't you just sync the variable portions and merge them? >but what i`m unsure about is, if rsync isn`t doing too much work >with detecting the differences. it doesn`t need to "look forth and >back" (as i read somewhere it would) , It doesn't. Everything is determined in a single pass over the file, at both ends. >> > besides that, for transferring complete files i know faster methods than >> > rsync. ... >here is some example: http://communities.vmware.com/thread/29721 All this can be done with rsync also (with the -W option). In the destination side I think it's unlikely something go faster; all programs should be similar. On the source side it's possible to go faster if the sender uses sendfile() (or the equivalent in other operating systems). >> Assuming the rsync algorithm works correctly, I don't >> see any difference between the end result of copying >> a 100gb file with the rsync algorithm or without it. >> The only difference is the amount of disk and network >> I/O that must occur. > >the rsync algorithm is using checksumming to find differences. >checksums are sort of "data reduction" which create a hash from >a larger amount of data. i just want to understand what makes >sure that there are no hash collisions which break the algorithm. There are several checksums. The ones to find differences are weak but at the end rsync checks the md5sum and if they don't match it transfers the file once more (tweaking things to avoid the same mismatch happening again). So the probability of failure is the one of accidentally having two different files with the same md5sum, which is 2^(-128). >mind that rsync exists for some time and by that time file sizes >transferred with rsync may have grown by a factor of 100 or >even 1000. It used to do a md4sum. Version 3 uses md5sum, which is safer. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html