On Apr 16, 3:16 am, Nigel Rantor <wig...@wiggly.org> wrote: > Adam Olsen wrote: > > On Apr 15, 12:56 pm, Nigel Rantor <wig...@wiggly.org> wrote: > >> Adam Olsen wrote: > >>> The chance of *accidentally* producing a collision, although > >>> technically possible, is so extraordinarily rare that it's completely > >>> overshadowed by the risk of a hardware or software failure producing > >>> an incorrect result. > >> Not when you're using them to compare lots of files. > > >> Trust me. Been there, done that, got the t-shirt. > > >> Using hash functions to tell whether or not files are identical is an > >> error waiting to happen. > > >> But please, do so if it makes you feel happy, you'll just eventually get > >> an incorrect result and not know it. > > > Please tell us what hash you used and provide the two files that > > collided. > > MD5 > > > If your hash is 256 bits, then you need around 2**128 files to produce > > a collision. This is known as a Birthday Attack. I seriously doubt > > you had that many files, which suggests something else went wrong. > > Okay, before I tell you about the empirical, real-world evidence I have > could you please accept that hashes collide and that no matter how many > samples you use the probability of finding two files that do collide is > small but not zero.
I'm afraid you will need to back up your claims with real files. Although MD5 is a smaller, older hash (128 bits, so you only need 2**64 files to find collisions), and it has substantial known vulnerabilities, the scenario you suggest where you *accidentally* find collisions (and you imply multiple collisions!) would be a rather significant finding. Please help us all by justifying your claim. Mind you, since you use MD5 I wouldn't be surprised if your files were maliciously produced. As I said before, you need to consider upgrading your hash every few years to avoid new attacks. -- http://mail.python.org/mailman/listinfo/python-list