Steven D'Aprano <[EMAIL PROTECTED]> writes: > This isn't a criticism, it is a genuine question. Why do people compare > local files with MD5 instead of doing a byte-to-byte compare? Is it purely > a caching thing (once you have the checksum, you don't need to read the > file again)? Are there any other reasons?
It's not just a matter of comparing two files. The idea is you have 10,000 local files and you want to find which ones are duplicates (i.e. if files 637 and 2945 have the same contents, you want to discover that). The obvious way is make a list of hashes, and sort the list. -- http://mail.python.org/mailman/listinfo/python-list