Hi Wayne,

Thanks for your response - it's much appreciated. Comments below

Wayne Davison wrote:
The only checksum that is being cached is the one that the user can
optionally request for a pre-transfer check.  It's not usually needed,
unless the "quick check" algorithm (size + mtime) has a chance of being
wrong.
  

In our case, the mtime is going to be different, since users would be installing a game from a CD and they would have an mtime from that initial install, so checksums would be needed.

A better update strategy would be some kind of a binary patch algorithm.

You may want to check into some other binary-patching software to see
what your options are (I haven't looked into it).
  

We've looked briefly at rsync's batch mode, but using that would likely be pretty similar to several other binary-patching solutions out there, and we'd have to go through the complexity of dealing with updates from multiple different source versions, which would add development work I was hoping we might avoid by just using rsync directly.

Lastly, does anyone have any empirical data on how well an rsync server 
with checksum-updating works with large number (eg: hundreds to 
thousands) of simultaneous clients?
    

Not that I know of.  For really large files, that is likely to be quite
a memory and CPU hog.  Each client will be sending you checksum data for
the whole file, and then the server will be doing its own checksumming
and block comparisons using this in-memory checksum cache.
  

I'm a bit confused still on this last point - would the cached checksums from the checksum-updating patch mean that the server would only have to be doing the block comparisons?  Or would the server still need to calculate the checksums themselves for every client?  IE: are the individual block checksums within a file cached by the checksum-updating patch, or is it just caching an overall file checksum?

Also, is it the server that does the block comparisons and decides what data to send, or does that happen on the client?  If it's the server, that would certainly be a bunch more overhead than I was thinking.  >From the "How rsync works" document (http://samba.anu.edu.au/rsync/how-rsync-works.html), it sounded like the receiver (aka client) became the 'generator'.  I guess I was thinking that the generator was responsible for requesting the individual blocks.  A re-read suggests that it is in fact the server that has to do the block comparisons as you seem to be suggesting.

Wouldn't it be more efficient in general for that to happen on the client side though?  One side certainly has to transfer the block checksums over to the other side, so why not make that be the server rather than the client and have the client do the block comparisons and then request individual blocks from the server?

Take care,
 -Gav

-- 
Gavriel State, Founder & CTO
TransGaming Inc.
[EMAIL PROTECTED]
http://www.transgaming.com

Broadening The Playing Field
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to