On 14.11.2012 13:33, Dave Angel wrote: > Te birthday paradox could have been important had the OP stated his goal > differently. What he said was: > > """Ideally I would want to avoid collisions altogether. But if that means > significant extra CPU time then 1 collision in 10 million hashes would be > tolerable.""" > > That means that he's willing to do the necessary overhead of collision > resolution, once in every 10 million lookups. That's not the same as > saying that he wants only one chance in 10 million of having ANY > collisions among his data items.
Since he stated in a later post that he actually went with MD5, the calculations are indeed relevant. They give the number of bits a perfect hash needs to have in order to get the desired low probablility of collision resolutions. And for that the birthday paradox probability must be considered instead of the (much lower) pre-image probability. In any case, it appeared to me as if the OP was rather looking for ideas and wasn't sure himself what approach to take -- so I find it quite appropriate to give suggestions one way or another (even if they might not fit the exact phrasing of one of his postings). Best regards, Johannes -- >> Wo hattest Du das Beben nochmal GENAU vorhergesagt? > Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1...@speranza.aioe.org> -- http://mail.python.org/mailman/listinfo/python-list