On Tue, 5 Jul 2005, Ron Adam wrote: > Tom Anderson wrote: > >> The trouble with these is that they make a lot of temporary lists - >> George's version does it with the recursive calls to flatten, and Ron's >> with the slicing and concatenating. How about a version which never >> makes new lists, only appends the base list? We can use recursion to >> root through the lists ... > > Ok... How about a non-recursive flatten in place? ;-)
How about, er, oh, i give up. > def flatten(seq): > i = 0 > while i!=len(seq): > while isinstance(seq[i],list): > seq.__setslice__(i,i+1,seq[i]) > i+=1 > return seq > > seq = [[1,2],[3],[],[4,[5,6]]] > print flatten(seq) > > I think I'll be using the __setslice__ method more often. Um, can't you just do slice assignment? Make that line: seq[i:i+1] = seq[i] > And the test: Stupendous and timely work! > The results on Python 2.3.5: (maybe someone can try it on 2.4) > > recursive flatten: 23.6332723852 > flatten in place-non recursive: 22.1817641628 > recursive-no copies: 30.909762833 > smallest recursive: 35.2678756658 > non-recursive flatten in place without copies: 7.8551944451 GAAAAH! > A 300% improvement!!! > > This shows the value of avoiding copies, recursion, and extra function > calls. Specifically, it shows the value of avoiding extra function calls, since my zerocopy version is slower than the copy-happy single-function versions. Also, there are some differences between the functions which constitute potential hillocks on the playing field - i test flattenability by looking for an __iter__ method, whereas other implementations mostly ask "instanceof list?" (less general, and less in the spirit of duck typing, IMNERHO). I'd argue that my decomposition into functions is this sort of difference, too - a matter of style (good style!), not algorithm. So, levelling those differences, and throwing in my non-recursive zerocopy foolery, here's my take on it ... # ---- # here be a quick reimplementation of timeit to time function objects # no exec for me no siree bob # all you need to know is that timeit(fn) gives you time taken to run fn import sys import time TIMERS = { "win32": time.clock } timer = TIMERS.get(sys.platform, time.time) def timeit(fn, n=None): if (n == None): t = 0.1 n = 1 while (t < 1.0): n = max(int((n * min((1.0 / t), 10))), (n + 1)) t = _timeit(fn, n) else: t = _timeit(fn, n) return t / n def _timeit(fn, n): it = xrange(n) t0 = timer() for i in it: fn() t1 = timer() return float((t1 - t0)) # there is real code now # i've rewritten the functions to use uniform variable names # and to use isiterable def isiterable(obj): return hasattr(obj, "__iter__") def georges_recursive_flatten(seq): return reduce(_accum, seq, []) def _accum(a, item): if isiterable(item): a.extend(georges_recursive_flatten(item)) else: a.append(item) return a def rons_nonrecursive_flatten(seq): a = [] while seq: while isiterable(seq[0]): seq = seq[0] + seq[1:] a.append(seq.pop(0)) return a def toms_recursive_zerocopy_flatten(seq, a=[]): if (isiterable(seq)): for item in seq: toms_recursive_zerocopy_flatten(item, a) else: a.append(seq) return a def toms_iterative_zerocopy_flatten(seq): stack = [None] cur = iter(seq) a = [] while (cur != None): try: item = cur.next() if (isiterable(item)): stack.append(cur) cur = iter(item) else: a.append(item) except StopIteration: cur = stack.pop() return a def devans_smallest_recursive_flatten(seq): if (isiterable(seq)): return sum([devans_smallest_recursive_flatten(item) for item in seq], []) else: return [seq] def rons_nonrecursive_inplace_flatten(seq): i = 0 while (i != len(seq)): while (isiterable(seq[i])): seq[i:(i + 1)] = seq[i] # setslice takes iterators! i = i + 1 return seq flattens = [ georges_recursive_flatten, rons_nonrecursive_flatten, toms_recursive_zerocopy_flatten, toms_iterative_zerocopy_flatten, devans_smallest_recursive_flatten, rons_nonrecursive_inplace_flatten ] seq = [[1,2],[3],[],[4,[5,6]]] def timeflatten(flatten): return timeit(lambda: flatten(seq)) def funcname(fn): return fn.func_name print zip(map(funcname, flattens), map(timeflatten, flattens)) # ---- The output (in python 2.3 on a 1.5 GHz G4 pbook with OS X 10.3) is: [ ('georges_recursive_flatten', 0.00015331475192276888), ('rons_nonrecursive_flatten', 0.00015447115513356376), ('toms_recursive_zerocopy_flatten', 0.00012239551614106925), ('toms_iterative_zerocopy_flatten', 0.00035910996630353429), ('devans_smallest_recursive_flatten', 0.00019606360118084218), ('rons_nonrecursive_inplace_flatten', 5.8524826144294404e-05) ] So, my zerocopy flatten is, after all, faster than George, you and Devan's copy-happy versions, although not by much, my iterative version is much, much slower, and the winner is still your in-place flatten, which beats my not-too-bad 122 microseconds with a scorching 58! I guess setslice is a lot faster than i thought. How are python lists implemented? Presumably not as straightforward arrays, where inserting a bunch of items at the head would mean moving everything else in the list along? We really ought to do this benchmark with a bigger list as input - a few thousand elements, at least. But that would mean writing a function to generate random nested lists, and that would mean specifying parameters for the geometry of its nestedness, and that would mean exploring the dependence of the performance of each flatten on each parameter, and that would mean staying up until one, so i'm not going to do that. tom -- [Philosophy] is kind of like being driven behind the sofa by Dr Who - scary, but still entertaining. -- Itchyfidget -- http://mail.python.org/mailman/listinfo/python-list