On Sun, Feb 07, 2010 at 10:08:39PM +0000, matt wrote:
> not only has someone got to find a collision during a tiny timeframe,  
> they also have to fit it in 8k

MD5 collisions can, apparently, be constructed in 24 hours on a laptop these
days.  Yes, it's a constraint, but.

Venti supports larger blocks than 8K, and even if it didn't, the exhibited
MD5 collisions to date are short strings and would fit.  Working with larger
data feels like it would be unlikely to be advantageous to the collision
search -- 8K is much more than 20 bytes, and so there should be plenty of
pidgeons to cram into holes.

In any case, I fear that the point has been lost in details.  Something
about forests and trees.  So:

The point was not to say that you shouldn't be using venti; it's a lovely
idea and the code does its job somewhat well, bugs aside.  The point was to
object to the kind of analysis which postulates that you need to store 2^98
blocks with a 256 bit hash (or 2^50 for 160-bit hashes) before collisions
start be meaningful.

If you're storing iid uniformly-selected-at-random blocks of iid uniform
noise then yes, that's true, but you're not and besides the instantiation of
the random oracle model you're using (SHA-1) isn't perfect.  As such, we
should expect to see collisions after many fewer blocks stored -- strictly,
the iid uniform-at-random selection criterion yields an upper bound.

One way to demonstrate this upper-bound nature is to exhibit work on hash
collisions and the reduction in estimated security margins.  To take an
extreme case, the Fletcher checksum or your favorite CRC polynomial can be
extended to be perfectly valid 160 bit hash, but the lack of cryptographic
security makes it laughably easy to collide blocks.  Surely you'd be dubious
of a Venti where I replaced SHA-1 with such an extended Fletcher?  (It'd be
faster, too!)  Why?  Selection of blocks iid uniformly-at-random should mean
that it will take exactly as many blocks as before, before you see a
problem...  SHA-1 is not laughably easy to break by any stretch of the
imagination, but it also isn't the "impossibly hard" you'd get in the random
oracle model either.

(While nitpicking about the analysis, I feel compelled to add that hardware
uncorrectable bitflips are still reported as erasures, whereas venti
collisions are reported as success and only caught if somebody's doing
checksumming at larger granularity.  So at least in the one case you know
you're in trouble.  Collisions in venti act as if the disk corrupted so many
bits that we move into the correctable region for a different codeword.)

--nwf;

Attachment: pgpKRQORZwz0C.pgp
Description: PGP signature

Reply via email to