[Patrick Useldinger] > Shouldn't you add the additional comparison time that has to be done > after hash calculation? Hashes do not give 100% guarantee. If there's > a large number of identical hashes, you'd still need to read all of > these files to make sure.
Identical hashes for different files? The probability of this happening should be extremely small, or else, your hash function is not a good one. I once was over-cautious about relying on hashes only, without actually comparing files. A friend convinced me, doing maths, that with a good hash function, the probability of a false match was much, much smaller than the probability of my computer returning the wrong answer, despite thorough comparisons, due to some electronic glitch or cosmic ray. So, my cautious attitude was by far, for all practical means, a waste. Similar arguments apply, say, for the confidence we may have in probabilistic algorithms for the determination of number primality. -- François Pinard http://pinard.progiciels-bpi.ca -- http://mail.python.org/mailman/listinfo/python-list