On Apr 16, 8:59 am, Grant Edwards <inva...@invalid> wrote: > On 2009-04-16, Adam Olsen <rha...@gmail.com> wrote: > > I'm afraid you will need to back up your claims with real files. > > Although MD5 is a smaller, older hash (128 bits, so you only need > > 2**64 files to find collisions), > > You don't need quite that many to have a significant chance of > a collision. With "only" something on the order of 2**61 > files, you still have about a 1% chance of a collision.
Aye, 2**64 is more of the middle of the curve or so. You can still go either way. What's important is the order of magnitude required. > For "a few million files" (we'll say 4e6), the probability of a > collision is so close to 0 that it can't be calculated using > double-precision IEEE floats. ≈ 0.000000000000000000000000023509887 Or 42535296000000000000000000 to 1. Or 42 trillion trillion to 1. > Here's the Python function I'm using: > > def bp(n, d): > return 1.0 - exp(-n*(n-1.)/(2.*d)) > > I haven't spent much time studying the numerical issues of the > way that the exponent is calculated, so I'm not entirely > confident in the results for "small" n values such that > p(n) == 0.0. Try using Qalculate. I always resort to it for things like this. -- http://mail.python.org/mailman/listinfo/python-list