Alex Martelli wrote: > Of course, hoisting the unbound method out of the loops can afford the > usual small optimization. But my point is that, in Python, these > operations (like, say, the concatenation of a sequence of lists, etc) > are best performed "in place" via loops calling mutator methods such as > update and intersection_update (or a list's extend, etc), rather than > "functionally" (building and tossing away many intermediate results). > E.g., consider: > > brain:~ alex$ python -mtimeit -s'sos=[set(range(x,x+4)) for x in > range(0, 100, 3)]' 'r=set()' 'for x in sos: r.update(x)' > 100000 loops, best of 3: 18.8 usec per loop > > brain:~ alex$ python -mtimeit -s'sos=[set(range(x,x+4)) for x in > range(0, 100, 3)]' 'r=reduce(set.union, sos, set())' > 10000 loops, best of 3: 87.2 usec per loop > > Even in a case as tiny as this one, "reduce" is taking over 4 times > longer than the loop with the in-place mutator -- and it only gets > worse, as we're talking about O(N squared) vs O(N) performance! Indeed, > this is part of what makes reduce an "attractive nuisance"...;-). [[And > so is sum, if used OTHERWISE than for the documented purpose, computing > "the sum of a sequence of numbers": a loop with r.extend is similarly > faster, to concatenate a sequence of lists, when compared to sum(sol, > [])...!!!]]
Curiously, when I attempt the actual problem at hand (set intersection) rather than something altogether different (set union) on my Dell laptop running Win2K Pro, I get the following results: stmt = 'r=reduce(set.intersection, los)' best of 3: 21.9 usec per loop stmt = 'r=intersect_all(los)' best of 3: 29.3 usec per loop So, the "nuisance" is more attractive than you thought. Apparently, one can't really depend on "in place" mutator methods being more efficient that using functional techniques. BTW, the above figures reflect 10,000 iterations; using larger counts makes the difference even larger. I suspect that this is because the Python interpreter has fewer chances to be interrupted by external processes when it's using 'reduce'. This in turn indicates that both implementations actually work about same and your "O(n squared)" argument is irrelevant. P.S. Here's the source I used for the timings: from timeit import Timer setup = """ def intersect_all(los): it = iter(los) result = set(it.next()) for s in it: result.intersection_update(s) return result not7=range(2,11) not7.remove(7) los=[set(range(0,360,step)) for step in not7] """ number = 10000 repeat = 3 precision = 3 for stmt in 'r=reduce(set.intersection, los)', 'r=intersect_all(los)': t = Timer(stmt, setup) r = t.repeat(repeat, number) best = min(r) usec = best * 1e6 / number print "stmt =", repr(stmt) print "best of %d: %.*g usec per loop" % (repeat, precision, usec) -- http://mail.python.org/mailman/listinfo/python-list