On 07/11/2012 10:06 PM, Bill Sommerfeld wrote: > On 07/11/12 02:10, Sašo Kiselkov wrote: >> Oh jeez, I can't remember how many times this flame war has been going >> on on this list. Here's the gist: SHA-256 (or any good hash) produces a >> near uniform random distribution of output. Thus, the chances of getting >> a random hash collision are around 2^-256 or around 10^-77. > > I think you're correct that most users don't need to worry about this -- > sha-256 dedup without verification is not going to cause trouble for them. > > But your analysis is off. You're citing the chance that two blocks picked at > random will have the same hash. But that's not what dedup does; it compares > the hash of a new block to a possibly-large population of other hashes, and > that gets you into the realm of "birthday problem" or "birthday paradox". > > See http://en.wikipedia.org/wiki/Birthday_problem for formulas. > > So, maybe somewhere between 10^-50 and 10^-55 for there being at least one > collision in really large collections of data - still not likely enough to > worry about.
Yeah, I know, I did this as a quick first-degree approximation. However, the provided range is still very far above the chances of getting a random bit-rot error that even Fletcher won't catch. > Of course, that assumption goes out the window if you're concerned that an > adversary may develop practical ways to find collisions in sha-256 within the > deployment lifetime of a system. sha-256 is, more or less, a scaled-up sha-1, > and sha-1 is known to be weaker than the ideal 2^80 strength you'd expect from > 2^160 bits of hash; the best credible attack is somewhere around 2^57.5 (see > http://en.wikipedia.org/wiki/SHA-1#SHA-1). Of course, this is theoretically possible, however, I do not expect such an attack to be practical within any reasonable time frame of the deployment. In any case, should a realistic need to solve this arise, we can always simply switch hashes (I'm also planning to implement Skein-512/256) and do a recv/send to rewrite everything on disk. PITA? Yes. Serious problem? Don't think so. > on a somewhat less serious note, perhaps zfs dedup should contain "chinese > lottery" code (see http://tools.ietf.org/html/rfc3607 for one explanation) > which asks the sysadmin to report a detected sha-256 collision to > eprint.iacr.org or the like... How about we ask them to report to me instead, like so: 1) Detect collision 2) Report to me 3) ??? 4) Profit! Cheers, -- Saso _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss