On Sat, 24 Dec 2005 15:47:17 +1100, Steven D'Aprano wrote: > On Fri, 23 Dec 2005 17:10:22 +0000, Dan Stromberg wrote: > >> I'm treating each file as a potentially very large string, and "sorting >> the strings". > > Which is a very strange thing to do, but I'll assume you have a good > reason for doing so.
I believe what the original poster wants to do is eliminate duplicate content from a collection of ogg/whatever files with different names. E.g., he has a python script that goes out and collects all the free music it can find on the web. The same song may appear on many sites under different names, and he wants only one copy of a given song. In any case, as others have pointed out, sorting by MD5 is sufficient except in cases far less probable than hardware failure - and deliberate collisions. E.g., the RIAA creates collision pairs of MP3 files where one member carries a freely redistributable license, and the other a "copy this and we'll sue your ass off" license in an effort to trap the unwary. -- Stuart D. Gathman <[EMAIL PROTECTED]> Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. -- http://mail.python.org/mailman/listinfo/python-list