Re: Why checksum? [was Re: Fuzzy Lookups]

Steven D'Aprano Wed, 01 Feb 2006 03:05:46 -0800

On Tue, 31 Jan 2006 13:38:50 -0800, Paul Rubin wrote:

> Steven D'Aprano <[EMAIL PROTECTED]> writes:
>> This isn't a criticism, it is a genuine question. Why do people compare
>> local files with MD5 instead of doing a byte-to-byte compare? Is it purely
>> a caching thing (once you have the checksum, you don't need to read the
>> file again)? Are there any other reasons?
> 
> It's not just a matter of comparing two files.  The idea is you have
> 10,000 local files and you want to find which ones are duplicates
> (i.e. if files 637 and 2945 have the same contents, you want to
> discover that).  The obvious way is make a list of hashes, and sort
> the list.


Sure. But if you are just comparing two files, is there any reason to
bother with a checksum? (MD5 or other.)

I can't see any, but I thought maybe that's because I'm not thinking
outside the box.


-- 
Steven.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Why checksum? [was Re: Fuzzy Lookups]

Reply via email to