Sammo <sammo2...@gmail.com> wrote: > String concatenation has been optimized since 2.3, so using += should > be fairly fast. > > In my first test, I tried concatentating a 4096 byte string 1000 times > in the following code, and the result was indeed very fast (12.352 ms > on my machine). > > import time > t = time.time() > mydata = "" > moredata = "A"*4096 > for i in range(1000): > mydata += moredata # 12.352 ms > print "%0.3f ms"%(1000*(time.time() - t)) > > However, I got a different result in my second test, which is > implemented in a class with a feed() method. This test took 4653.522 > ms on my machine, which is 350x slower than the previous test! > > class StringConcatTest: > def __init__(self): > self.mydata = "" > > def feed(self, moredata): > self.mydata += moredata # 4653.522 ms > > test = StringConcatTest() > t = time.time() > for i in range(1000): > test.feed(moredata) > print "%0.3f ms"%(1000*(time.time() - t)) > > Note that I need to do something to mydata INSIDE the loop, so please > don't tell me to append moredata to a list and then use "".join after > the loop. > > Why is the second test so much slower?
The optimized += depends on their being no other references to the string. Strings are immutable in python. So append must return a new string. However the += operation was optimised to do an in-place append if and only if there are no other references to the string. You can see this demonstrated here $ python -m timeit -s 'a="x"' 'a+="x"' 1000000 loops, best of 3: 0.231 usec per loop $ python -m timeit -s 'a="x"; b=a' 's = a; a+="x"' 100000 loops, best of 3: 30.1 usec per loop You are keeping the extra reference in a class instance like this $ python -m timeit -s 'class A(object): pass' -s 'a=A(); a.a="x"' 'a.a+="x"' 100000 loops, best of 3: 30.7 usec per loop Knowing that, this optimization suggests itself $ python -m timeit -s 'class A(object): pass' -s 'a=A(); a.a="x"' 's = a.a; a.a = None; s += "x"; a.a = s' 1000000 loops, best of 3: 0.954 usec per loop Or in your example class StringConcatTest: def __init__(self): self.mydata = "" def feed(self, moredata): #self.mydata += moredata s = self.mydata del self.mydata s += moredata self.mydata = s moredata = "A"*4096 test = StringConcatTest() t = time.time() for i in range(1000): test.feed(moredata) print "%0.3f ms"%(1000*(time.time() - t)) Before it was 3748.012 ms on my PC, afterwards it was 52.737 ms However that isn't a perfect solution - what if something had another reference on self.mydata? You really want a non-immutable string for this use. array.array is a possibility $ python -m timeit -s 'import array' -s 'a = array.array("c")' 'a.extend("x")' 100000 loops, best of 3: 2.01 usec per loop There are many other possibilities though like the mmap module. -- Nick Craig-Wood <n...@craig-wood.com> -- http://www.craig-wood.com/nick -- http://mail.python.org/mailman/listinfo/python-list