On Tue, 31 Jan 2006 13:38:50 -0800, Paul Rubin wrote: > Steven D'Aprano <[EMAIL PROTECTED]> writes: >> This isn't a criticism, it is a genuine question. Why do people compare >> local files with MD5 instead of doing a byte-to-byte compare? Is it purely >> a caching thing (once you have the checksum, you don't need to read the >> file again)? Are there any other reasons? > > It's not just a matter of comparing two files. The idea is you have > 10,000 local files and you want to find which ones are duplicates > (i.e. if files 637 and 2945 have the same contents, you want to > discover that). The obvious way is make a list of hashes, and sort > the list.
Sure. But if you are just comparing two files, is there any reason to bother with a checksum? (MD5 or other.) I can't see any, but I thought maybe that's because I'm not thinking outside the box. -- Steven. -- http://mail.python.org/mailman/listinfo/python-list