On Thu, Jul 22, 2010 at 09:53:35AM +0100, Philip Martin wrote: > s...@apache.org writes: > > > Author: stsp > > Date: Tue Jul 20 16:14:53 2010 > > New Revision: 965892 > > > > URL: http://svn.apache.org/viewvc?rev=965892&view=rev > > Log: > > Make svnadmin dump print headers containing MD5 and SHA1 checksums of > > property content, as was already done for file content. Checksums are > > printed for revision properties as well as versioned properties. > > Do we gain anything by having both MD5 and SHA1 checksums?
We print both kinds of checksum for file content, too, so it's just for consistency. > Do we need checksums per property? Yes, because the idea is that the loader (or other tools handling dump files) can identify properties that were corrupted, and let the user know which properties were corrupted. > Often the checksums will take up more space than the property. Quite possibly. But the overhead is fixed in size, and it's all ASCII so maybe it compresses quite well? If you are concerned about space, there is already the --deltas option which saves the bulk of the size of a regular v2 dump. I haven't done any empirical analysis, but I'd suspect that deltas of file content will proportionally save more space than the property hashes can make up for. > What about property names? Hmmm... names aren't covered by checksums right now, that is true. Maybe we should compute the hash over "key=value" strings, such as "svn:eol-style=native"? > Perhaps we could just have one checksum that includes all the property > names and values? I'd like loaders to be able to tell users which properties are corrupted. Note that I've added this in response to a complaint that dump files carry very little information about their integrity (and that svnsync does close to zero consistency checks, too, but that is a different site). I've found out that we have checksums for content already in 1.6.x, but then found out that properties don't have any checksums. That's just unnecessary inconsistency. If we're going to have one, we should have the other. The checksums come before content, so the loader can easily be made to verify content. It would make sense to make our own loader start verifying checksums soon. In general, having checksums built into the dump file itself does not protect against purposeful corruption. It's just a safety net for accidental corruption. For security purposes, people should cryptographically sign the entire dump file. Stefan