On Wed, 20 Jan 2010 05:25:22 +0100, Alf P. Steinbach wrote: > * Steven D'Aprano: >> On Tue, 19 Jan 2010 16:20:42 -0500, Gerald Britton wrote: >> >>> That's surprising. I wouldn't implement it that way at all. I'd use a >>> dynamically-expanding buffer as I suggested. That way you get a >>> single pass and don't have to calculate anything before you begin. In >>> the best case, you'd use half the memory (the result just fits in the >>> buffer after its last expansion and no saved tuple). In the worst >>> case, the memory use is about the same (you just expanded the buffer >>> using a 2x expansion rule then you hit the last item). >> >> In the worst case, you waste 50% of the memory allocated. > > Yes. That is a good argument for not doing the expanding buffer thing. > But such buffers may be generally present anyway, resulting from > optimization of "+".
As near as I can determine, the CPython optimization you are referring to doesn't use the "double the buffer when needed" technique. It operates on a completely different strategy. As near as I can tell (as a non-C speaker), it re-sizes the string in place to the size actually needed, thus reducing the amount of copying needed. The optimization patch is here: http://bugs.python.org/issue980695 and some history (including Guido's opposition to the patch) here: http://mail.python.org/pipermail/python-dev/2004-August/046686.html Nevertheless, the patch is now part of CPython since 2.4, but must be considered an implementation-specific optimization. It doesn't apply to Jython, and likely other implementations. > Using CPython 2.6.4 in Windows XP: [snip time measurements] > Here the linear increase of times indicate that the "+" is being > optimized using an expanding buffer for the string. Be careful of jumping to the conclusion from timings that a certain algorithm is used. All you can really tell is that, whatever the algorithm is, it has such-and-such big-oh behaviour. > If only the minimal > space was allocated each time then one would expect O(n^2) behavior > instead of the apparently O(n) above. I don't follow that reasoning. > Example of that O(n^2) behavior given below. The example shown demonstrates that the + optimization only applies in certain specific cases. In fact, it only applies to appending: s += t s = s + t but not prepending or concatenation with multiple strings: s = t + s s += t + u However, keep in mind that the CPython keyhole optimizer will take something like this: s += "x" + "y" and turn it into this: s += "xy" which the concatenation optimization does apply to. Optimizations make it hard to reason about the behaviour of algorithms! -- Steven -- http://mail.python.org/mailman/listinfo/python-list