Re: Performance of lists vs. list comprehensions

Alf P. Steinbach Tue, 19 Jan 2010 20:29:10 -0800

* Steven D'Aprano:

On Tue, 19 Jan 2010 16:20:42 -0500, Gerald Britton wrote:

That's surprising. I wouldn't implement it that way at all.  I'd use a
dynamically-expanding buffer as I suggested.  That way you get a single
pass and don't have to calculate anything before you begin.  In the best
case, you'd use half the memory (the result just fits in the buffer
after its last expansion and no saved tuple).  In the worst case, the
memory use is about the same (you just expanded the buffer using a 2x
expansion rule then you hit the last item).


In the worst case, you waste 50% of the memory allocated.

Yes. That is a good argument for not doing the expanding buffer thing. But suchbuffers may be generally present anyway, resulting from optimization of "+".


Using CPython 2.6.4 in Windows XP:


>>> def elapsed_time_for( f, n_calls ):
...     return timeit.timeit( f, number = n_calls )
...
>>> def appender( n ):
...     def makestr( n = n ):
...         s = ""
...         for i in xrange( n ):
...             s = s + "A"
...         return s
...     return makestr
...
>>> appender( 1000 )() == 1000*"A"
True
>>>
>>> for i in xrange( 10 ):
...     print( elapsed_time_for( appender( 10000*(i+1) ), 100 ) )
...
0.782596670811
1.37728454314
2.10189898437
2.76442173517
3.34536707878
4.08251830889
4.79620119317
5.42201844089
6.12892811796
6.84236460221
>>> _

Here the linear increase of times indicate that the "+" is being optimized usingan expanding buffer for the string. If only the minimal space was allocated eachtime then one would expect O(n^2) behavior instead of the apparently O(n) above.Example of that O(n^2) behavior given below.



>>> def exact_appender( n ):
...     def makestr( n = n ):
...         s = ""
...         for i in xrange( n ):
...             new_s = s + "A"
...             s = new_s
...         return s
...     return makestr
...
>>> exact_appender( 1000 )() == 1000*"A"
True
>>> for i in xrange( 10 ):
...     print( elapsed_time_for( exact_appender( 10000*(i+1) ), 100 ) )
...
3.28094241027
9.30584501661
19.5319170453
33.6563767183
52.3327800042
66.5475022663
84.8809736992
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "<stdin>", line 2, in elapsed_time_for
  File "C:\Program Files\cpython\python26\lib\timeit.py", line 227, in timeit
    return Timer(stmt, setup, timer).timeit(number)
  File "C:\Program Files\cpython\python26\lib\timeit.py", line 193, in timeit
    timing = self.inner(it, self.timer)
  File "C:\Program Files\cpython\python26\lib\timeit.py", line 99, in inner
    _func()
  File "<stdin>", line 5, in makestr
KeyboardInterrupt
>>> _

So, given that apparently the simple '+' in the first example is optimized usingan expanding buffer, which then hangs around, it's not clear to me that thespace optimization in 'join' really helps. It may be (but isn't necessarily)like shoveling snow in a snowstorm. Then the effort/cost could be for naught.

And becausestrings are immutable (unlike lists and dicts, which also use thisapproach), you can never use that memory until the string is garbagecollected.

I think that the simple '+', with the apparent optimization shown in the firstexample above, can use that space. I know for a fact that when you control astring implementation then it can do that (since I've implemented that). But Idon't know for a fact that it's practical to do so in Python. In order to usethe space the implementation must know that there's only one reference to thestring. And I don't know whether that information is readily available in aCPython implementation (say), although I suspect that it is.



Cheers,

- Alf
--
http://mail.python.org/mailman/listinfo/python-list

Re: Performance of lists vs. list comprehensions

Reply via email to