On Aug 3, 8:38 am, [EMAIL PROTECTED] (Alex Martelli) wrote: > [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > A naive approach to rank ordering (handling ties as well) of nested > > lists may be accomplished via: > > > def rankLists(nestedList): > > def rankList(singleList): > > sortedList = list(singleList) > > sortedList.sort() > > return map(sortedList.index, singleList) > > return map(rankList, nestedList) > > > >>> unranked = [ [ 1, 2, 3, 4, 5 ], [ 3, 1, 5, 2, 4 ], [ -1.1, 2.2, > > 0, -1.1, 13 ] ] > > >>> print rankLists(unranked) > > > [[0, 1, 2, 3, 4], [2, 0, 4, 1, 3], [0, 3, 2, 0, 4]] > > > This works nicely when the dimensions of the nested list are small. > > It is slow when they are big. Can someone suggest a clever way to > > speed it up? > > Each use of sortedList.index is O(N) [where N is len(singleList)], and > you have N such uses in the map in the inner function, so this approach > is O(N squared). Neil's suggestion to use bisect replaces the O(N) > .index with an O(log N) search, so the overall performance is O(N log N) > [[and you can't do better than that, big-O wise, because the sort step > is also O(N log N)]]. > > "beginner"'s advice to use a dictionary is also good and may turn out to > be faster, just because dicts are SO fast in Python -- but you need to > try and measure both alternatives. One way to use a dict (warning, > untested code): > > def rankList(singleList): > d = {} > for i, item in reversed(enumerate(sorted(singleList))): > d[item] = i > return [d[item] for item in singleList] > > If you find the for-clause too rich in functionality, you can of course > split it up a bit; but note that you do need the 'reversed' to deal with > the corner case of duplicate items (without it, you'd end up with 1s > instead of 0s for the third one of the sublists in your example). If > speed is of the essence you might try to measure what happens if you > replace the returned expression with map(d.__getitem__, singleList), but > I suspect the list comprehension is faster as well as clearer. > > Another potential small speedup is to replace the first 3 statements > with just one: > > d = dict((item,i) for i,item in reversed(enumerate(sorted(singleList)))) > > but THIS density of functionality is a bit above my personal threshold > of comfort ("sparse is better than dense":-). > > Alex
Thanks for breaking it down so thoroughly. I try the different recipes and see which gives the best trade offs between readability and performance. Agreed that dict() approach looks promising. -- http://mail.python.org/mailman/listinfo/python-list