>>>>> Adam Olsen <rha...@gmail.com> (AO) wrote: >AO> The Wayback Machine has 150 billion pages, so 2**37. Google's index >AO> is a bit larger at over a trillion pages, so 2**40. A little closer >AO> than I'd like, but that's still 562949950000000 to 1 odds of having >AO> *any* collisions between *any* of the files. Step up to SHA-256 and >AO> it becomes 191561940000000000000000000000000000000000000000000000 to >AO> 1. Sadly, I can't even give you the odds for SHA-512, Qalculate >AO> considers that too close to infinite to display. :)
>AO> You should worry more about your head spontaneously exploding than you >AO> should about a hash collision on that scale. To do otherwise is >AO> irrational paranoia. And that is the probability if there being two files in that huge collection with the same hash. If you just take two files, not fabricated to collide, the probability of them having the same hash under MD5 is 2**-128 which I think is way smaller than the probability of the bit representing the answer being swapped by some physical cause in your computer. But then again, it doesn't make sense to use that instead of byte-by-byte comparison if the files are on the same machine. -- Piet van Oostrum <p...@cs.uu.nl> URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4] Private email: p...@vanoostrum.org -- http://mail.python.org/mailman/listinfo/python-list