On Wed, 27 Jan 2010 16:16:59 -0800, Jonathan Gardner wrote: > On Jan 27, 3:54 pm, Paul Rubin <no.em...@nospam.invalid> wrote: >> Steven D'Aprano <ste...@remove.this.cybersource.com.au> writes: >> > always much better written with key rather than cmp: key adds an O(N) >> > overheard to the sorting, while cmp makes sorting O(N**2). >> >> Whaaaaaaaaaa ...... ???? No I don't think so. > > You're referring to the O(N**2) bit, right? I am sure he knew it was O > (N*log(N)), which is still worse than O(N) for key. > > If he didn't, well, Python has some fundamental flaws in its basic sort > algorithm.
I may have the details wrong, but the performance of sort with cmp is very poor. sort with key calls the key function once for each element in the list, which is O(N), and then sorts O(N*log N), giving a total of O(N*log N). sort with cmp has to compare the comparison function multiple times for each element. I *thought* it ended up calling it N times each, giving O(N**2), but I may be delusional. Whatever it is though, it's quite slow. Here's my test code (for Python 2.5): from timeit import Timer from string import letters from random import shuffle def mycmp(s, t): return cmp(s.upper(), t.upper()) def randstr(): L = list(letters) shuffle(L) return ''.join(L) setup = "from __main__ import mycmp, data" N = 300 data = [randstr() for _ in xrange(N)] t1 = Timer("d = data[:]; d.sort()", setup) # raw sort t2 = Timer("d = data[:]; d.sort(cmp=mycmp)", setup) t3 = Timer("d = data[:]; d.sort(key=str.upper)", setup) for t in (t1, t2, t3): print min(t.repeat(number=500)) N *= 10 # Do it again with bigger N. data = [randstr() for _ in xrange(N)] for t in (t1, t2, t3): print min(t.repeat(number=500)) And here are the results on my PC: # N = 300 0.0631489753723 2.36756515503 0.226485967636 # N = 3000 1.03457903862 35.3092911243 2.77242517471 This doesn't support my claim of O(N**2), but it does demonstrate that sort with cmp should be avoided if at all possible. -- Steven -- http://mail.python.org/mailman/listinfo/python-list