On 2016-02-07 21:46, Paulo da Silva wrote: > Suppose I have already a class MyFile that has an efficient method > (or operator) to compare two MyFile s for equality. > > What is the most efficient way to obtain all sets of equal files (of > course each set must have more than one file - all single files are > discarded)?
If your MyFile grows a __hash__ method that is based off the uniqueness-qualities that you're comparing, you can use collections.Counter and filter the results: from collections import Counter interesting = [ my_file for my_file, count in Counter(generate_MyFile_objects()).iteritems() if count > 1 ] Which should be roughly O(n). It also has the advantage of iterating over your generator once. If you the MyFile objects can be unique but compare for equality (e.g. two files on the file-system that have the same SHA1 hash, but you want to know the file-names), you'd have to do a paired search which would have worse performance and would need to iterate over the data multiple times: all_files = list(generate_MyFile_objects()) interesting = [ (my_file1, my_file2) for i, my_file1 in enumerate(all_files, 1) for my_file2 in all_files[i:] if my_file1 == my_file2 ] -tkc -- https://mail.python.org/mailman/listinfo/python-list