Às 22:17 de 07-02-2016, Tim Chase escreveu: > On 2016-02-07 21:46, Paulo da Silva wrote: ...
> > If you the MyFile objects can be unique but compare for equality > (e.g. two files on the file-system that have the same SHA1 hash, but > you want to know the file-names), you'd have to do a paired search > which would have worse performance and would need to iterate over the > data multiple times: > > all_files = list(generate_MyFile_objects()) > interesting = [ > (my_file1, my_file2) > for i, my_file1 > in enumerate(all_files, 1) > for my_file2 > in all_files[i:] > if my_file1 == my_file2 > ] > "my_file1 == my_file2" can be implemented into MyFile class taking advantage of caching sizes (if different files are different), hashes or even content (for small files) or file headers (first n bytes). However this seems to have a problem: all_files: a b c d e ... If a==b then comparing b with c,d,e is useless. May be using several steps with dict - sizes, then hashes for same sizes files, etc ... Another solution I thought of, could be defining some methods (I still don't know which ones) in MyFile so that I could use sets intersection. Would this one be a faster solution? Thanks -- https://mail.python.org/mailman/listinfo/python-list