* Gerald Britton:
Yesterday I stumbled across some old code in a project I was working
on. It does something like this:
mystring = '\n'.join( [ line for line in lines if <some conditions
depending on line> ] )
where "lines" is a simple list of strings. I realized that the code
had been written before you could put a list comprehension as an
argument to join(). I figured that I could drop the '[' and ']' and
leave the rest as a list comprehension since that works with current
Python (and the project's base is 2.5). So I rewrote the original
statement like this:
mystring = '\n'.join( line for line in lines if <some conditions
depending on line> )
It works as expected. Then I got curious as to how it performs. I
was surprised to learn that the rewritten expression runs more than
twice as _slow_. e.g.:
l
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
Timer("' '.join([x for x in l])", 'l = map(str,range(10))').timeit()
2.9967339038848877
Timer("' '.join(x for x in l)", 'l = map(str,range(10))').timeit()
7.2045478820800781
Notice that I dropped the condition testing that was in my original
code. I just wanted to see the effect of two different expressions.
I thought that maybe there was some lower bound on the number of the
items in the list or list comprehension beyond which the comprehension
would prove more efficient. There doesn't appear to be one. I scaled
the length of the input list up to 1 million items and got more or
less the same relative performance.
Now I'm really curious and I'd like to know:
1. Can anyone else confirm this observation?
>>> Timer("' '.join([x for x in l])", 'l = map(str,range(10))').timeit()
5.8625191190500345
>>> Timer("' '.join(x for x in l)", 'l = map(str,range(10))').timeit()
12.093135300715574
>>> _
2. Why should the "pure" list comprehension be slower than the same
comprehension enclosed in '[...]' ?
Regarding (2) the unparenthesized expression in join is *not* a list
comprehension but a generator expression.
And as such it involves join calling next() on the generator object repeatedly,
with each next() call involving a light-weight context shift.
In addition the docs mumble something about "lazy" evaluation, and that may also
contribute to the overhead.
I think that in contrast, the interpreter can evaluate a list comprehension, [x
for x in blah], directly without any context shifting, just by transforming it
to equivalent code and putting the target expressions innermost there.
And so the main factor causing a slowdown for a list comprehension would, I
think, be paging and such if the list it produced was Really Huge, while for the
generator there's no memory issue but rather much calling & context shifting.
Cheers & hth.,
- Alf
--
http://mail.python.org/mailman/listinfo/python-list