Eli the Bearded wrote: > In comp.lang.python, Peter Otten <__pete...@web.de> wrote: >> Eli the Bearded wrote: >>> But what caught my eye most, as someone relatively new to Python but >>> with long experience in C in Perl, is sorting doesn't take a > > s/C in /C and/ > > Ugh. > >>> *comparison* function, it takes a *key generator* function, and that >>> function is supposed to transform the thing into something that the >>> native comparison knows how to compare. >>> >>> This seems a strange choice, and I'm wondering if someone can explain >>> the benefits of doing it that way to me. >> >> Python 2 started with a comparison function and then grew a key function. >> With a key function you still have to compare items, you are just >> breaking the comparison into two steps: > > [snip] > > Thanks for that good explanation. The benchmark comparison makes it > very thorough. > > In my mind I gravitate towards the complicated sorts of sort that can be > quickly compared for some sorts of keys and not as quickly for others. > > Consider a sort that first compares file size and if the same number of > bytes, then compares file checksum. Any decently scaled real world > implementation would memoize the checksum for speed, but only work it out > for files that do not have a unique file size. The key method requires > it worked out in advance for everything.
Oscar already mentioned the functools.cmp_to_key decorator which makes this a non-issue: def mycmp(a, b): ... files.sort(key=cmp_to_key(mycmp)) Applied to your example, with memoization: # untested def cmp(a, b): return (a > b)-(a < b) def make_file_key(): size = functools.lru_cache(None)(getsize) checksum = functools.lru_cache(None)(getchecksum) @functools.cmp_to_key def file_key(a, b): return cmp(size(a), size(b)) or cmp(checksum(a), checksum(b)) return file_key files.sort(key=make_file_key()) > But I see the key method handles the memoization under the hood for you, > so those simpler, more common sorts of sort get an easy to see benefit. > > Elijah > ------ > even memoizing the stat() calls would help for large lists PS: If you are sorting files by size and checksum as part of a deduplication effort consider using dict-s instead: def grouped(items, key): result = defaultdict(list) for item in items: result[key(item)].append(item) return result for same_size in grouped(files, key=getsize).values(): if len(same_size) > 1: for same_checksum in grouped(same_size, key=getchecksum).values(): if len(same_checksum) > 1: print(same_checksum) -- https://mail.python.org/mailman/listinfo/python-list