Re: A sets algorithm

2016-02-08 Thread Gregory Ewing
Chris Angelico wrote: hash_to_filename = defaultdict(list) for fn in files: # Step 1: Hash every file. hash = calculate_hash(fn) # Step 2: Locate all pairs of files with identical hashes hash_to_filename[hash].append(fn) I think you can avoid hashing the files altogether. Firs

Re: A sets algorithm

2016-02-08 Thread Chris Angelico
On Tue, Feb 9, 2016 at 3:13 PM, Steven D'Aprano wrote: > On Tuesday 09 February 2016 02:11, Chris Angelico wrote: > >> That's fine for comparing one file against one other. He started out >> by saying he already had a way to compare files for equality. What he >> wants is a way to capitalize on th

Re: A sets algorithm

2016-02-08 Thread Steven D'Aprano
On Tuesday 09 February 2016 02:11, Chris Angelico wrote: > That's fine for comparing one file against one other. He started out > by saying he already had a way to compare files for equality. What he > wants is a way to capitalize on that to find all the identical files > in a group. A naive appro

Re: A sets algorithm

2016-02-08 Thread Chris Angelico
On Tue, Feb 9, 2016 at 1:49 AM, Random832 wrote: > On Sun, Feb 7, 2016, at 20:07, Cem Karan wrote: >> a) Use Chris Angelico's suggestion and hash each of the files (use the >> standard library's 'hashlib' for this). Identical files will always have >> identical hashes, but there may be fa

Re: A sets algorithm

2016-02-08 Thread Random832
On Sun, Feb 7, 2016, at 20:07, Cem Karan wrote: > a) Use Chris Angelico's suggestion and hash each of the files (use the > standard library's 'hashlib' for this). Identical files will always have > identical hashes, but there may be false positives, so you'll need to verify > that files t

Re: A sets algorithm

2016-02-07 Thread Paulo da Silva
Às 21:46 de 07-02-2016, Paulo da Silva escreveu: > Hello! > > This may not be a strict python question, but ... > > Suppose I have already a class MyFile that has an efficient method (or > operator) to compare two MyFile s for equality. > > What is the most efficient way to obtain all sets of eq

Re: A sets algorithm

2016-02-07 Thread Cem Karan
On Feb 7, 2016, at 4:46 PM, Paulo da Silva wrote: > Hello! > > This may not be a strict python question, but ... > > Suppose I have already a class MyFile that has an efficient method (or > operator) to compare two MyFile s for equality. > > What is the most efficient way to obtain all sets

Re: A sets algorithm

2016-02-07 Thread Tim Chase
On 2016-02-08 00:05, Paulo da Silva wrote: > Às 22:17 de 07-02-2016, Tim Chase escreveu: >> all_files = list(generate_MyFile_objects()) >> interesting = [ >> (my_file1, my_file2) >> for i, my_file1 >> in enumerate(all_files, 1) >> for my_file2 >> in all_files[i:] >> if m

Re: A sets algorithm

2016-02-07 Thread Paulo da Silva
Às 22:17 de 07-02-2016, Tim Chase escreveu: > On 2016-02-07 21:46, Paulo da Silva wrote: ... > > If you the MyFile objects can be unique but compare for equality > (e.g. two files on the file-system that have the same SHA1 hash, but > you want to know the file-names), you'd have to do a paired se

Re: A sets algorithm

2016-02-07 Thread Tim Chase
On 2016-02-07 21:46, Paulo da Silva wrote: > Suppose I have already a class MyFile that has an efficient method > (or operator) to compare two MyFile s for equality. > > What is the most efficient way to obtain all sets of equal files (of > course each set must have more than one file - all single

Re: A sets algorithm

2016-02-07 Thread Oscar Benjamin
On 7 Feb 2016 21:51, "Paulo da Silva" wrote: > > Hello! > > This may not be a strict python question, but ... > > Suppose I have already a class MyFile that has an efficient method (or > operator) to compare two MyFile s for equality. > > What is the most efficient way to obtain all sets of equal

Re: A sets algorithm

2016-02-07 Thread Chris Angelico
On Mon, Feb 8, 2016 at 8:46 AM, Paulo da Silva wrote: > Hello! > > This may not be a strict python question, but ... > > Suppose I have already a class MyFile that has an efficient method (or > operator) to compare two MyFile s for equality. > > What is the most efficient way to obtain all sets of

A sets algorithm

2016-02-07 Thread Paulo da Silva
Hello! This may not be a strict python question, but ... Suppose I have already a class MyFile that has an efficient method (or operator) to compare two MyFile s for equality. What is the most efficient way to obtain all sets of equal files (of course each set must have more than one file - all