Thanks! Good explanation. On Tue, Jan 19, 2010 at 10:57 AM, Alf P. Steinbach <al...@start.no> wrote: > * Gerald Britton: >> >> Yesterday I stumbled across some old code in a project I was working >> on. It does something like this: >> >> mystring = '\n'.join( [ line for line in lines if <some conditions >> depending on line> ] ) >> >> where "lines" is a simple list of strings. I realized that the code >> had been written before you could put a list comprehension as an >> argument to join(). I figured that I could drop the '[' and ']' and >> leave the rest as a list comprehension since that works with current >> Python (and the project's base is 2.5). So I rewrote the original >> statement like this: >> >> mystring = '\n'.join( line for line in lines if <some conditions >> depending on line> ) >> >> It works as expected. Then I got curious as to how it performs. I >> was surprised to learn that the rewritten expression runs more than >> twice as _slow_. e.g.: >> >>>>> l >> >> ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] >> >>>>> Timer("' '.join([x for x in l])", 'l = map(str,range(10))').timeit() >> >> 2.9967339038848877 >> >>>>> Timer("' '.join(x for x in l)", 'l = map(str,range(10))').timeit() >> >> 7.2045478820800781 >> >> Notice that I dropped the condition testing that was in my original >> code. I just wanted to see the effect of two different expressions. >> >> I thought that maybe there was some lower bound on the number of the >> items in the list or list comprehension beyond which the comprehension >> would prove more efficient. There doesn't appear to be one. I scaled >> the length of the input list up to 1 million items and got more or >> less the same relative performance. >> >> Now I'm really curious and I'd like to know: >> >> 1. Can anyone else confirm this observation? > >>>> Timer("' '.join([x for x in l])", 'l = map(str,range(10))').timeit() > 5.8625191190500345 >>>> Timer("' '.join(x for x in l)", 'l = map(str,range(10))').timeit() > 12.093135300715574 >>>> _ > > >> 2. Why should the "pure" list comprehension be slower than the same >> comprehension enclosed in '[...]' ? > > Regarding (2) the unparenthesized expression in join is *not* a list > comprehension but a generator expression. > > And as such it involves join calling next() on the generator object > repeatedly, with each next() call involving a light-weight context shift. > > In addition the docs mumble something about "lazy" evaluation, and that may > also contribute to the overhead. > > I think that in contrast, the interpreter can evaluate a list comprehension, > [x for x in blah], directly without any context shifting, just by > transforming it to equivalent code and putting the target expressions > innermost there. > > And so the main factor causing a slowdown for a list comprehension would, I > think, be paging and such if the list it produced was Really Huge, while for > the generator there's no memory issue but rather much calling & context > shifting. > > > Cheers & hth., > > - Alf > -- > http://mail.python.org/mailman/listinfo/python-list >
-- Gerald Britton -- http://mail.python.org/mailman/listinfo/python-list