I did some timings of ''.join( <list comprehension> ) vs. ''.join( <generator expression> ) and found that generator expressions were slightly slower, so I looked at the source code to find out why. It turns out that the very first thing string_join(self, orig) does is:
seq = PySequence_Fast(orig, ""); thus iterating over your generator expression and creating a list, making it less efficient than passing a list in the first place via a list comprehension. The reason it does this is exactly why you said: It iterates over the sequence and gets the sum of the lengths, adds the length of n-1 separators, and then allocates a string this size. Then it iterates over the list again to build up the string. For generators, you'd have to make a trial allocation and start appending stuff as you go, periodically resizing. This *might* end up being more efficient in the case of generators, but the only way to know for sure is to write the code and benchmark it. I will be at PyCon 2005 during the sprint days, so maybe I'll write it then if someone doesn't beat me to it. I don't think it'll be all that hard. It might be best done as an iterjoin() method, analogous to iteritems(), or maybe xjoin() (like xrange(), xreadlines()). Incidentally, I was inspired to do the testing in the first place from this: http://www.skymind.com/~ocrow/python_string/ Those tests were done with Python-2.3. With 2.4, naive appending (i.e. doing s1 += s2 in a loop) is about 13-15% slower than a list comprehension, but uses much less memory (for large loops); and a generator expression is about 7% slower and uses slightly *more* memory. -- http://mail.python.org/mailman/listinfo/python-list