Re: Why checksum? [was Re: Fuzzy Lookups]

Paul Rubin Tue, 31 Jan 2006 13:40:50 -0800

Steven D'Aprano <[EMAIL PROTECTED]> writes:
> This isn't a criticism, it is a genuine question. Why do people compare
> local files with MD5 instead of doing a byte-to-byte compare? Is it purely
> a caching thing (once you have the checksum, you don't need to read the
> file again)? Are there any other reasons?


It's not just a matter of comparing two files.  The idea is you have
10,000 local files and you want to find which ones are duplicates
(i.e. if files 637 and 2945 have the same contents, you want to
discover that).  The obvious way is make a list of hashes, and sort
the list.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Why checksum? [was Re: Fuzzy Lookups]

Reply via email to