Steve Holden wrote: > you should diff your source against the current > SVN repository and lodge that diff as a patch on SourceForge.
Okay, I'll try to do that today. > Your suggested bug isn't, I think a real bug in the current > implementation because as I understand it Python strings do always > include a trailing null byte to act as a terminator when the string > is passed to C code in extensions. Ah! Excellent point. So the Python source tree *does* have an official position on what the first byte of a zero-length string should be. I'll fix my code so that is ensured. > Nice idea, though. You might also see how it does on tasks like > > s = "" > for i in range(100000): > s += "a" Sure. Here are the results, but with range (10000000): Python 2.5 release: 31.0s Python 2.5 locally built: 30.7s Python 2.5 concat: 4.4s % Improvement: *697%*! Woah! Thanks for suggesting a benchmark where it really shines :) The real big win here is storing eight pointers per concatenation object. That means 1/8th as many objects created/destroyed. In testing your suggested benchmark, it overflowed the stack when freeing the concatenation tree. So I changed deallocation of string concatenation objects so it's iterative down the left-hand side like rendering; now it's happy again. Thanks again for suggesting the test. It would still blow up if you ran s = "" for i in range(10000000): s = "a" + s because it's only clever about recursion down the left-hand side. So my advice is, don't. :) > Also note that some optimisation has recently been performed on string > concatenation, which I presume your code has already taken into account. Totally. I'm just glad there was something left for me to contribute! Rob Williscroft wrote: > On the python 3k list there is a thread about stings becoming "views", > I think this is similar to what you are doing here. As I understand it, sort-of. Definitely moreso in my section on THE FUTURE--adding another subtype of string for slices which point into other strings. I believe views are more sophisticated however. Felipe Almeida Lessa wrote: > What about "a + b"? Or "a + b + c"? Does it have a large overhead on > small string concatenations? It's *slightly* slower for two: def addTwoThings(a, b): return a + b for i in range(10000000): x = addTwoThings("aaa", "bbb") Python 2.5 release: 6.219s Python 2.5 locally built: 6.219s Python 2.5 concat: 6.234s % improvement: -0.03% But starts paying off already, even with three: def addThreeThings(a, b, c): return a + b + c for i in range(10000000): x = addThreeThings("aaa", "bbb", "ccc") Python 2.5 release: 7.9s Python 2.5 locally built: 7.7s Python 2.5 concat: 7.2s % improvement: 6.94% Note that I was lazy and only did one benchmark run apiece. Also, keep in mind that memory utilization will be a little higher with the string concatenation objects. Carl Friedrich Bolz wrote: > Robin Becker wrote: > > wouldn't this approach apply to other additions eg list+list seq+seq etc > > etc. > no, I think it depends on strings being immutable. Exactly. /larry/ -- http://mail.python.org/mailman/listinfo/python-list