Yesterday I stumbled across some old code in a project I was working on. It does something like this:
mystring = '\n'.join( [ line for line in lines if <some conditions depending on line> ] ) where "lines" is a simple list of strings. I realized that the code had been written before you could put a list comprehension as an argument to join(). I figured that I could drop the '[' and ']' and leave the rest as a list comprehension since that works with current Python (and the project's base is 2.5). So I rewrote the original statement like this: mystring = '\n'.join( line for line in lines if <some conditions depending on line> ) It works as expected. Then I got curious as to how it performs. I was surprised to learn that the rewritten expression runs more than twice as _slow_. e.g.: >>> l ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] >>> Timer("' '.join([x for x in l])", 'l = map(str,range(10))').timeit() 2.9967339038848877 >>> Timer("' '.join(x for x in l)", 'l = map(str,range(10))').timeit() 7.2045478820800781 Notice that I dropped the condition testing that was in my original code. I just wanted to see the effect of two different expressions. I thought that maybe there was some lower bound on the number of the items in the list or list comprehension beyond which the comprehension would prove more efficient. There doesn't appear to be one. I scaled the length of the input list up to 1 million items and got more or less the same relative performance. Now I'm really curious and I'd like to know: 1. Can anyone else confirm this observation? 2. Why should the "pure" list comprehension be slower than the same comprehension enclosed in '[...]' ? -- Gerald Britton -- http://mail.python.org/mailman/listinfo/python-list