Steven D'Aprano wrote:
On Fri, 12 Mar 2010 00:11:37 -0700, Zooko O'Whielacronx wrote:
Folks:
Every couple of years I run into a problem where some Python code that
worked well at small scales starts burning up my CPU at larger scales,
and the underlying issue turns out to be the idiom of accumulating data
by string concatenation.
I don't mean to discourage you, but the simple way to avoid that is not
to accumulate data by string concatenation.
The usual Python idiom is to append substrings to a list, then once, at
the very end, combine into a single string:
accumulator = []
for item in sequence:
accumulator.append(process(item))
string = ''.join(accumulator)
It just happened again
(http://foolscap.lothar.com/trac/ticket/149 ), and as usual it is hard
to make the data accumulator efficient without introducing a bunch of
bugs into the surrounding code.
I'm sorry, I don't agree about that at all. I've never come across a
situation where I wanted to use string concatenation and couldn't easily
modify it to use the list idiom above.
[...]
Here are some benchmarks generated by running python -OOu -c 'from
stringchain.bench import bench; bench.quick_bench()' as instructed by
the README.txt file.
To be taken seriously, I think you need to compare stringchain to the
list idiom. If your benchmarks favourably compare to that, then it might
be worthwhile.
IIRC, someone did some work on making concatenation faster by delaying
it until a certain threshold had been reached (in the string class
implementation).
--
http://mail.python.org/mailman/listinfo/python-list