Steven D'Aprano a écrit :
On Sat, 26 Jul 2008 18:54:22 +0000, Robert Latest wrote:
Here's an interesting side note: After fixing my "Channel" thingy the
whole project behaved as expected. But there was an interesting hitch.
The main part revolves around another class, "Sequence", which has a
list of Channels as attribute. I was curious about the performance of my
script, because eventually this construct is supposed to handle
megabytes of data. So I wrote a simple loop that creates a new Sequence,
fills all the Channels with data, and repeats.
Interistingly, the first couple of dozens iterations went satisfactorily
quickly (took about 1 second total), but after a hundred or so times it
got really slow -- like a couple of seconds per iteration.
Playing around with the code, not really knowing what to do, I found
that in the "Sequence" class I had again erroneously declared a
class-level attribute -- rather harmlessly, just a string, that got
assigned to once in each iteration on object creation.
After I had deleted that, the loop went blindingly fast without slowing
down.
What's the mechanics behind this behavior?
Without actually seeing the code, it's difficult to be sure, but my guess
is that you were accidentally doing repeated string concatenation. This
can be very slow.
In general, anything that looks like this:
s = ''
for i in range(10000): # or any big number
s = s + 'another string'
can be slow. Very slow.
But this is way faster:
s = ''
for i in range(10000): # or any big number
s += 'another string'
(snip)
It's harder to stumble across the slow behaviour these days, as Python
2.4 introduced an optimization that, under some circumstances, makes
string concatenation almost as fast as using join().
yeps : using augmented assignment (s =+ some_string) instead of
concatenation and rebinding (s = s + some_string).
But be warned: join()
is still the recommended approach. Don't count on this optimization to
save you from slow code.
>
If you want to see just how slow repeated concatenation is compared to
joining, try this:
import timeit
t1 = timeit.Timer('for i in xrange(1000): x=x+str(i)+"a"', 'x=""')
t2 = timeit.Timer('"".join(str(i)+"a" for i in xrange(1000))', '')
t1.repeat(number=30)
[0.8506159782409668, 0.80239105224609375, 0.73254203796386719]
t2.repeat(number=30)
[0.052678108215332031, 0.052067995071411133, 0.052803993225097656]
Concatenation is more than ten times slower in the example above,
Not using augmented assignment:
>>> from timeit import Timer
>>> t1 = Timer('for i in xrange(1000): x+= str(i)+"a"', 'x=""')
>>> t2 = Timer('"".join(str(i)+"a" for i in xrange(1000))', '')
>>> t1.repeat(number=30)
[0.07472991943359375, 0.064207077026367188, 0.064996957778930664]
>>> t2.repeat(number=30)
[0.071865081787109375, 0.061071872711181641, 0.06132817268371582]
(snip)
And even worse:
t1.repeat(number=50)
[2.7190279960632324, 2.6910948753356934, 2.7089321613311768]
t2.repeat(number=50)
[0.087616920471191406, 0.088094949722290039, 0.087819099426269531]
Not that worse here:
>>> t1.repeat(number=50)
[0.12305188179016113, 0.10764503479003906, 0.10605692863464355]
>>> t2.repeat(number=50)
[0.11200308799743652, 0.10315108299255371, 0.10278487205505371]
>>>
I'd still advise using the sep.join(seq) approach, but not because of
performances.
--
http://mail.python.org/mailman/listinfo/python-list