On Thu, 12 Dec 2013 16:08:33 +0100, Peter Otten wrote: > Steven D'Aprano wrote: [...] >> So, ideally I'd like to write my code like this: >> >> >> table = [(x, i) for i,x in enumerate(iterable)] table.sort() >> if len(table) < ?????: >> table = [i for x,i in table] >> else: >> for x, i in table: >> table[i] = x >> >> where ????? no doubt will depend on how much memory is available in one >> contiguous chunk. >> >> Is there any way to determine which branch I should run, apart from >> hard- coding some arbitrary and constant cut-off value? > > How about using two lists? > > keys = list(iterable) > values = range(len(keys)) > values.sort(key=keys.__getitem__) > del keys > > The intention is to save memory used for the 2-tuples; I don't know if > they pop up elsewhere.
They don't. On the other hand, you've now got two lists where before I only had one. In my post, I said that the code I showed was a simplification. What I actually have is two branches. If the given iterable is a sequence type (say, a list) I use something quite similar to your code above: if isinstance(iterable, collections.Sequence): getitem = iterable.__getitem__ if key is not None: # Compose the given key with getitem. f = lambda i: key(getitem(i)) else: f = getitem table = range(len(iterable)) # Python 2.x version table.sort(key=f, reverse=reverse) which avoids any temporary two-tuples, and creates only the one additional list. I can't avoid making that list, since it's the result required. I don't think that this branch of the code can be optimized in any serious way, I can't see any additional fat to be trimmed off that. If the input list is so large as to cause thrashing, the only solution is to buy more memory. The other branch handles iterators, and I have two versions. The idiomatic version is faster for small data sets (small in this case meaning "up to at least one million items on my computer") but ends up building at least one extra temporary list which is thrown away: else: table = [(x, i) for (i, x) in enumerate(iterable)] table.sort(key=key, reverse=reverse) table = [i for x, i in table] I've experimented with slight variations, e.g. using sorted() and a generator expression instead of a list comp, and the differences are trivial. The important thing here is that there's a temporary list which in some cases is *very large* but destined to be thrown away. Having two very large lists (for large enough input sizes) causes thrashing. To avoid that thrashing, I have a second version which builds only one list and then modifies it in place. For very large input sizes (ten million items) this is significantly faster: else: table = [(x, i) for (i, x) in enumerate(iterable)] table.sort(key=key, reverse=reverse) for x, i in enumerate(table): table[i] = x but much slower, and less idiomatic, for small input sizes. I don't know of any reasonable way to tell at runtime which of the two algorithms I ought to take. Hard-coding an arbitrary value ("if len(table) > 5000000") is not the worst idea I've ever had, but I'm hoping for something more deterministic which can be cheaply calculated at runtime. -- Steven -- https://mail.python.org/mailman/listinfo/python-list