On Thu, 2004-04-08 at 12:36, Martin Pool wrote: > On 5 Apr 2004, Donovan Baarda <[EMAIL PROTECTED]> wrote: > > > librsync needs a whole file checksum. Without it, it silently fails for > > case 1), 3), and 4). > > Yes, a whole-file checksum should be used with it. Presumably > something stronger than md4 like SHA-1.
md4 is probably good enough for most applications. I know it's not as secure as others, but when you take into account the requirement that the signature match as well, compromising it becomes much more complicated. > I think the only question is whether this should be done internally in > librsync, or as a separate process. I can see arguments either way. > > In some cases you might prefer to actually store an signed signature > using something like GPG. Yeah... good point. The rdiff program should probably use a whole file md4sum though. > > librsync could benefit from a random checksum_seed. It would need to be > > included in the signature. Without it librsync is vulnerable to cases 1) > > and 3). > > Random with respect to what? I think it would be nice if repeatedly > summing identicaly files gave identical signatures. Maybe it can vary > depending on only the input data... The problem is repeatability is what makes it vulnerable. If the content fully determines the signature, the content can be crafted to produce a particular signature. It is relatively easy to calculate two different blocks with the same blocksum if the checksum_seed is known. librsync could be modified to detect when two blocks had the same blocksum in a signature, and permutate the checksum seed in a deterministic way to try repeatedly until a signature without collisions is found. This would give repeatable signatures and protect against case 1), but not case 3). It would also allow crafted files that force m/2 recalculations of the signature. I think I've just realised what you were getting at; if the checksum_seed is based on something like the whole file md4sum, it becomes repeatable, but unpredictable. You can't manipulate individual blocks without it affecting every other blocksum, but the signature for the same file is always the same. Nice :-) This would fit in nicely with adding a whole-file checksum to the signature... generate the wholefile md4sum, and use that as the starting "seed" for the individual blocksums. The wholefile sum becomes the seed. This doesn't allow for "permutating" the seed if it happens to result in blocksum clashes. However, given that the probability if this happening is pretty astronomical (when using the wholefile sum seed) and will be caught by a whole-file checksum miss-match, do we care? It is far more likely to get false block matches when calculating the delta (because sums are calculated at every byte boundary, not just at block boundaries). More thoughts on using the wholefile sum as the seed; at first I thought this would still be vulnerable to case 3). Using a any single block as the original file and trying to find any block that matches means you still have "birthday algorithm" numbers of blocks to check (ie 2^(n/2)). However, each block "comparison" requires the recalculation of the "target" blocksum using the "original" blocksum as the seed, resulting in non-birthday algorithm number of checksum calculations (ie, 2^n). I've Cc'd this to the rsync list because they might get some ideas from it. -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html