On Thu, Dec 12, 2013 at 11:08 PM, Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote: > P.S. The algorithm I'm working on is a way of generating index and rank > tables. Not that it really matters -- what matters is determining whether > or not to shift from "make a copy of the list" to "modify the list in > place".
So you're currently looking at... if len(table) < ?????: table = [i for x,i in table] else: for x, i in table: table[i] = x Can I throw a spanner [1] in the works with other suggestions to try timing? table[:] = [i for x,i in table] # Does slice assignment get optimized? SPLIT = 1048576 # Pick some useful cutoff for part in range(0,len(table),SPLIT): table[part:part+SPLIT] = [i for x,i in table[part:part+SPLIT]] If slice assignment is reasonably fast (which I suspect it is), the one-liner should be practically identical to your small-table one-liner. Then if the million-record splits can be done inside memory, it ought to be possible to do this in comparable time, even if the total table length is huge. The choice of SPLIT would then matter a lot less than the cutoff that you're trying to find; you might have been able to do it in half the number of sections, but that won't make as big a difference as suddenly paging out. Ideally, what I'd like to see is that a higher SPLIT improves performance slightly until it gets too high, at which point you go bust and the dealer wins... but the critical word there is "slightly", meaning that it wouldn't cost too much for SPLIT to be lower than it needs to be. That's the criteria for the experiment; do you have the data on which to try it? [1] Monkey wrench, for the Yanks in the room ChrisA -- https://mail.python.org/mailman/listinfo/python-list