Re: Factor out .rsyncsums logic into a separate checksum-caching library?

Matt McCutchen Sat, 30 Jun 2007 12:47:29 -0700

On 6/30/07, Wayne Davison <[EMAIL PROTECTED]> wrote:

On Sun, Jun 24, 2007 at 01:03:03PM -0400, Matt McCutchen wrote:
> Specifically, it has protection against being fooled when a file's
> checksum is cached and the file is modified again in the same second;
> .rsyncsums could use this.


I tried to find a description for this algorithm, but didn't see it
mentioned in any of the web searches I made.  Is the algorithm described
anywhere?  Or is my only choice to dig into the source and try to find
it?


Try here:

http://git.kernel.org/?p=git/git.git;a=blob;f=Documentation/technical/racy-git.txt;hb=HEAD

> The git index has been heavily used and tested, so you might find it
> helpful when implementing a checksum cache for rsync.

The problem with this is that the git cache is SHA1, and rsync needs
both MD4 and MD5, depending on what protocol version is in effect.
It should be possible to adapt their code for rsync's purpose, but it's
probably overkill.  The idea behind the new checksum patch is mainly to
allow servers to provide cached checksums for their files, especially
servers whose content is slow to change.


I didn't necessarily think you should reuse any code from the git
cache, just ideas.  You're already storing mtimes and ctimes; it
appears to me that the only relevant things that git does and you
don't are store the size and i-number and protect against same-second
modification.  Maybe adding those is overkill for rsync's purposes.
But if I wrote a library implementing a completely foolproof checksum
cache that could be used with MD4, MD5, or any other checksum
algorithm, would you be likely to adopt it for rsync?

Matt
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Factor out .rsyncsums logic into a separate checksum-caching library?

Reply via email to