[ python-Bugs-1085744 ] PySequence_Tuple not as fast as PySequence_List

SourceForge.net Thu, 16 Dec 2004 02:36:45 -0800

Bugs item #1085744, was opened at 2004-12-15 07:15
Message generated for change (Comment added) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1085744&group_id=5470


Category: Python Interpreter Core
Group: Python 2.5
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Nick Coghlan (ncoghlan)
Assigned to: Raymond Hettinger (rhettinger)
Summary: PySequence_Tuple not as fast as PySequence_List

Initial Comment:
A little "my code is faster than your code" game on
python-list showed up a pathological interaction
between itertools.chain and str.join:

C:\>python -m timeit -s "data = [map(str, range(x)) for
x in range(1000)]; from itertools import chain"
"''.join(chain(*data))"
10 loops, best of 3: 1.2 sec per loop ****** OUCH!!

Some extra experiments show that creating a list from
the result of chain is fast, but creating a tuple is
horrendously slow:

C:\>python -m timeit -s "data = [map(str, range(x)) for
x in range(1000)]; from itertools import chain"
"''.join(list(chain(*data)))"
10 loops, best of 3: 107 msec per loop

C:\>python -m timeit -s "data = [map(str, range(x)) for
x in range(1000)]; from itertools import chain"
"''.join(tuple(chain(*data)))"
10 loops, best of 3: 1.2 sec per loop

Creating the tuple by means of a list is actually
faster than creating the tuple directly:

C:\>python -m timeit -s "data = [map(str, range(x)) for
x in range(1000)]; from itertools import chain"
"''.join(tuple(list(chain(*data))))"
10 loops, best of 3: 146 msec per loop

A check with imap suggests the problem may be a more
general interaction between PySequence_Fast and
iterators which don't have __len__ methods:

C:\>python -m timeit -s "data = [y for x in range(1000)
for y in map(str, range(x))]; from itertools import
imap; val = lambda arg: arg" "''.join(list(imap(val,
data)))"
10 loops, best of 3: 310 msec per loop

C:\>python -m timeit -s "data = [y for x in range(1000)
for y in map(str, range(x))]; from itertools import
imap; val = lambda arg: arg" "''.join(imap(val, data))"
10 loops, best of 3: 1.4 sec per loop

Looking at the code supports that, too -
PySequence_Fast uses PySequence_Tuple, which is great
when PyObject_Size gives a nice answer, but can lead to
a lot of tuple resizing when it isn't (one resize per
10 items beyond 10 up to 500, then one resize per 100
items beyond 500).

2.4's optimised list extension means that this *really*
hurts performance wise.

The other aspect is whether or not some of the
utilities in itertools could benefit from a __len__
method that returned a sensible answer if their inputs
defined __len__ methods, and returned -1 otherwise
(this would certainly work well with PySequence_Tuple,
but I realise it's a slightly odd behaviour for a
__len__ method).

----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2004-12-16 05:39

Message:
Logged In: YES 
user_id=80475

Loaded the patch.
See Objects/abstract.c 2.133

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-12-15 16:27

Message:
Logged In: YES 
user_id=80475

Attaching a small patch for PySequence_Tuple.   Try it out
and let me know if you find it to be an improvement.

It also adds a bit of error checking that should have been
there anyway.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-12-15 12:14

Message:
Logged In: YES 
user_id=80475

P.S.  In your example, you can get an immediate improvement
in performance by having Py_SequenceFast use PySequence_List
instead of PySequence_Tuple.

If you want to work on some Py2.5 optimizations along these
lines, look at replacing the somewhat arbitrary
over-allocation strategy in PySequence_Tuple.  

If you're super-motivated, see if you can improve on the
algorithm for str.join.

Since these are just strategy changes, they will improve
some cases at the expense of others.  The goal is to find
the one that does the best, most of the time; not horribly
in worst situations; and makes the fewest demands on memory.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-12-15 10:04

Message:
Logged In: YES 
user_id=80475

Remain calm ;-)  Also, please isolate the issues instead of
clumping everything together.  And this is likely best
handled by email rather than SF (you're asking me to explain
all of your timings rather than pointing to a buggy piece of
code).                                                     
              

The idea (feature request) to make iterators length
transparent has already been explored and reached a dead
end.  I've implemented it already it the few situations
where it can work (for example itertools.repeat(obj, n) and
dict.iteritems() report their length).  Most other
situations run into logical inconsistencies due to mutable
iterables being indistinguishable from iterators over those
iterables.  See Lib/test/test_iterlen.py for the gory details.

Writing f(*it) is also its own issue -- for uses other than
argument passing, Guido considers it an abuse (since *
unwinds its components on to ceval's stack).

Likewise, there is nothing unique to itertools here.  All of
the timings can be shown to have similar results if
generators are used instead of itertools.  This issue is
really how consumers handle unsized iterable inputs.

You cover three different consumers, ''.join(it), list(it),
and tuple(it) which all take different approaches to unsized
iterable inputs.  So, it is no surprise that the three have
different performance characteristics.                     
                         

str.join() is its own little bundle of joy whose behavior is
dictated by its need to make multiple passes over the input.
 Improving its handling of unsized iterable inputs is a
thorny subject.  You're welcome to post a patch. The best
way to analyze what is going on is to disregard the timings
and instead draw little diagrams of what is memory at any
given time.   Also, draw a data migration path -- you want
the algorithm to move the elements as few times as possible.
 Be sure to handle cases like dict.iteritems() which know
their length but are not a list or tuple.  Also, handle
cases where the inputs are created on the fly (not already
held in memory as in your example):  ''.join(str(i%10) for i
in xrange(n)).  FYI, there was an interesting bug report on
this subject a year ago.  IIRC, the OP saw that memory
consumption was greater with an iterable input than if the
iterable had been first wrapped in list().                 
                             

I don't see anything out of whack for the creation of lists
and tuples from unsized iterables.  There are simply two
different overallocation algorithms; consequently, there
will always be situations where one outperforms the other.
For your giant sized examples, the tuple growth strategy of
fixed sized overallocations does poorly against the list
strategy of increasingly large overallocations.  In
contrast, tuples can do better for certain small sizes.

If you think there are critical use cases for tuple(it)
where for a large, unsized it, then you can proposed a
variable length strategy like that used for lists. If such
use cases exist, I suspect they don't fit real well with
Guido's intended philosophy on tuple use.

Am closing this as a bug report.  It is more a mixture of
questions (explain this timing) and feature requests.  Feel
free to send me emails to explore this further.  I've been
through this all before and it is a substantial analysis
project.  If you go down this path, you may well find some
room for improvements (str.join) in particular.  Also, be
sure to separate the issues. It is not all helpful to
simultaneously cover itertools, *args, lists vs tuples,
unsized iterators, str.join, list.extend, and a proposals to
make  terators report their length.                        
                      


----------------------------------------------------------------------

Comment By: Nick Coghlan (ncoghlan)
Date: 2004-12-15 07:17

Message:
Logged In: YES 
user_id=1038590

Kicking in your direction Raymond, since this can badly
affect the performance of itertools.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1085744&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[ python-Bugs-1085744 ] PySequence_Tuple not as fast as PySequence_List

Reply via email to