Re: StringChain -- a data structure for managing large sequences of chunks of bytes

MRAB Fri, 12 Mar 2010 05:45:58 -0800

Steven D'Aprano wrote:

On Fri, 12 Mar 2010 00:11:37 -0700, Zooko O'Whielacronx wrote:
Folks:

Every couple of years I run into a problem where some Python code that
worked well at small scales starts burning up my CPU at larger scales,
and the underlying issue turns out to be the idiom of accumulating data
by string concatenation.
I don't mean to discourage you, but the simple way to avoid that is notto accumulate data by string concatenation.
The usual Python idiom is to append substrings to a list, then once, atthe very end, combine into a single string:
accumulator = []
for item in sequence:
    accumulator.append(process(item))
string = ''.join(accumulator)
It just happened again
(http://foolscap.lothar.com/trac/ticket/149 ), and as usual it is hard
to make the data accumulator efficient without introducing a bunch of
bugs into the surrounding code.
I'm sorry, I don't agree about that at all. I've never come across asituation where I wanted to use string concatenation and couldn't easilymodify it to use the list idiom above.
[...]
Here are some benchmarks generated by running python -OOu -c 'from
stringchain.bench import bench; bench.quick_bench()' as instructed by
the README.txt file.
To be taken seriously, I think you need to compare stringchain to thelist idiom. If your benchmarks favourably compare to that, then it mightbe worthwhile.

IIRC, someone did some work on making concatenation faster by delaying
it until a certain threshold had been reached (in the string class
implementation).
--
http://mail.python.org/mailman/listinfo/python-list

Re: StringChain -- a data structure for managing large sequences of chunks of bytes

Reply via email to