Re: a program to delete duplicate files

Patrick Useldinger Sat, 12 Mar 2005 09:00:06 -0800

François Pinard wrote:

Identical hashes for different files?  The probability of this happening
should be extremely small, or else, your hash function is not a good one.

We're talking about md5, sha1 or similar. They are all known not to be 100% perfect. I agree it's a rare case, but still, why settle on something "about right" when you can have "right"?

I once was over-cautious about relying on hashes only, without actually
comparing files.  A friend convinced me, doing maths, that with a good
hash function, the probability of a false match was much, much smaller
than the probability of my computer returning the wrong answer, despite
thorough comparisons, due to some electronic glitch or cosmic ray.  So,
my cautious attitude was by far, for all practical means, a waste.

It was not my only argument for not using hashed. My algorithm also does less reads, for example.

-pu
--
http://mail.python.org/mailman/listinfo/python-list

Re: a program to delete duplicate files

Reply via email to