Sammo wrote: > Okay, this is what I have tried for string concatenation: > > 1. Using += implemented using simple operations (12 ms) > 2. Using += implemented inside a class (4000+ ms) > 3. Using "".join implemented using simple operations (4000+ ms) > 4. Using "".join implemented inside a class (4000+ ms)
I think you're doing more work than you need to. Here are my results: >>> def join_test(): ... L = [] ... block = "abcdefghijklmnopqrstuvwxyz"*1000 ... for i in xrange(1000): ... L.append(block) ... return ''.join(L) ... >>> def concat_test(): ... s = "" ... block = "abcdefghijklmnopqrstuvwxyz"*1000 ... for i in xrange(1000): ... s += block ... return s ... >>> assert join_test() == concat_test() >>> >>> from timeit import Timer >>> Timer('concat_test()', ... 'from __main__ import concat_test').repeat(number=100) [5.397799015045166, 4.3106229305267334, 4.3492171764373779] >>> Timer('join_test()', ... 'from __main__ import join_test').repeat(number=100) [4.5378072261810303, 3.9887290000915527, 3.919511079788208] The join() idiom is slightly faster than repeated concatenation, even with the optimization. If you're not seeing that result, you need to consider what extra work you're doing that you don't need to. >> Then why not just mutate the list and then call "".join? > > AFAIK, using list mutation and "".join only improves performance if > the "".join is executed outside of the loop. Naturally. If you needlessly join over and over again, instead of delaying until the end, then you might as well do string concatenation without the optimization. join() isn't some magical function that makes your bugs go away. You have to use it sensibly. What you've done is a variant of Shlemiel the road-painter's algorithm: http://www.joelonsoftware.com/articles/fog0000000319.html Perhaps you have to repeatedly do work on the *temporary* results in between loops? Something like this toy example? s = "" block = "abcdefghijklmnopqrstuvwxyz"*1000 for i in xrange(1000): s += block # do something with the partially concatenated string print "partial length is", len(s) # all finished do_something(s) You've written that using join: L = [] block = "abcdefghijklmnopqrstuvwxyz"*1000 for i in xrange(1000): L.append(block) # do something with the partially concatenated string L = [ ''.join(L) ] print "partial length is", len(L[0]) # all finished s = ''.join(L) do_something(s) Okay, but we can re-write that more sensibly: L = [] block = "abcdefghijklmnopqrstuvwxyz"*1000 for i in xrange(1000): L.append(block) # do something with the partially concatenated string print "partial length is", sum(len(item) for item in L) # all finished s = ''.join(L) do_something(s) There's still a Shlemiel algorithm in there, but it's executed in fast C instead of slow Python and it doesn't involve copying large blocks of memory, only walking them, so you won't notice as much. Can we be smarter? L = [] block = "abcdefghijklmnopqrstuvwxyz"*1000 partial_length = 0 for i in xrange(1000): L.append(block) partial_length += len(block) # do something with the partially concatenated string print "partial length is", partial_length # all finished s = ''.join(L) do_something(s) Naturally this is a toy example, but I think it illustrates one technique for avoiding turning an O(N) algorithm into an O(N**2) algorithm. Good luck! -- Steven -- http://mail.python.org/mailman/listinfo/python-list