On 11/14/2012 06:29 AM, Johannes Bauer wrote:
> <snip>
>
> When doing these calculations, it's important to keep the birthday
> paradox in mind (this is kind of counter-intuitive): The chance of a
> collission raises tremendously when we're looking for *any* arbitrary
> two hashes colliding within a certain namespace. The probability you've
> calculated is the pre-image probability (which you also again need to
> multiply with a factor of two, because when trying to collide one given
> hash, in the mean case you'll only have to search *half* the namespace
> before finding a collision).
> <snip>

Te birthday paradox could have been important had the OP stated his goal
differently.  What he said was:

"""Ideally I would want to avoid collisions altogether. But if that means 
significant extra CPU time then 1 collision in 10 million hashes would be 
tolerable."""

That means that he's willing to do the necessary overhead of collision
resolution, once in every 10 million lookups.  That's not the same as
saying that he wants only one chance in 10 million of having ANY
collisions among his data items.



-- 

DaveA

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to