Re: [9fans] In case anyone worries about block hash collision in venti

hiro Tue, 09 Feb 2010 05:20:26 -0800

>I feel compelled to add that hardware
>uncorrectable bitflips are still reported as erasures, whereas venti
>collisions are reported as success and only caught if somebody's doing
>checksumming at larger granularity.


Uncorrectable bitflips can also be unreported.

If you are really paranoid and don't want any collisions in the next
10 years: don't let strangers in your venti.

On 2/9/10, Nathaniel W Filardo <n...@cs.jhu.edu> wrote:
> On Sun, Feb 07, 2010 at 10:08:39PM +0000, matt wrote:
>> not only has someone got to find a collision during a tiny timeframe,
>> they also have to fit it in 8k
>
> MD5 collisions can, apparently, be constructed in 24 hours on a laptop these
> days.  Yes, it's a constraint, but.
>
> Venti supports larger blocks than 8K, and even if it didn't, the exhibited
> MD5 collisions to date are short strings and would fit.  Working with larger
> data feels like it would be unlikely to be advantageous to the collision
> search -- 8K is much more than 20 bytes, and so there should be plenty of
> pidgeons to cram into holes.
>
> In any case, I fear that the point has been lost in details.  Something
> about forests and trees.  So:
>
> The point was not to say that you shouldn't be using venti; it's a lovely
> idea and the code does its job somewhat well, bugs aside.  The point was to
> object to the kind of analysis which postulates that you need to store 2^98
> blocks with a 256 bit hash (or 2^50 for 160-bit hashes) before collisions
> start be meaningful.
>
> If you're storing iid uniformly-selected-at-random blocks of iid uniform
> noise then yes, that's true, but you're not and besides the instantiation of
> the random oracle model you're using (SHA-1) isn't perfect.  As such, we
> should expect to see collisions after many fewer blocks stored -- strictly,
> the iid uniform-at-random selection criterion yields an upper bound.
>
> One way to demonstrate this upper-bound nature is to exhibit work on hash
> collisions and the reduction in estimated security margins.  To take an
> extreme case, the Fletcher checksum or your favorite CRC polynomial can be
> extended to be perfectly valid 160 bit hash, but the lack of cryptographic
> security makes it laughably easy to collide blocks.  Surely you'd be dubious
> of a Venti where I replaced SHA-1 with such an extended Fletcher?  (It'd be
> faster, too!)  Why?  Selection of blocks iid uniformly-at-random should mean
> that it will take exactly as many blocks as before, before you see a
> problem...  SHA-1 is not laughably easy to break by any stretch of the
> imagination, but it also isn't the "impossibly hard" you'd get in the random
> oracle model either.
>
> (While nitpicking about the analysis, I feel compelled to add that hardware
> uncorrectable bitflips are still reported as erasures, whereas venti
> collisions are reported as success and only caught if somebody's doing
> checksumming at larger granularity.  So at least in the one case you know
> you're in trouble.  Collisions in venti act as if the disk corrupted so many
> bits that we move into the correctable region for a different codeword.)
>
> --nwf;
>

Re: [9fans] In case anyone worries about block hash collision in venti

Reply via email to