New submission from Stefan Behnel <sco...@users.sourceforge.net>:

In the context of better interfacing of PyPy with Cython, it appears that 
simple looking things like PyTuple_GET_ITEM() are often rather involved in 
PyPy's C-API implementation. However, since functions/macros like these are 
used very frequently, this has an effect on the achievable performance.

It occurred to me that there are cases that involve many C-API calls where the 
intention is simply to unpack a sequence (or iterable) of known length, often 
just 2 or 3 items. Argument unpacking is one such situation (for which there 
are appropriate C-API functions), dict item iteration or iteration over 
enumerate() are other well known cases (at least in Python space). As the one 
obvious way to handle the general use case, I propose the following addition of 
a convenience function to the C-API:

int PyIter_Unpack(PyObject* iterable, Py_ssize_t min_unpack, Py_ssize_t 
max_unpack, ...)

As indicated by the names, it's meant to unpack any iterable or iterator, 
really, i.e. it would fall back to iteration if the iterable is neither a tuple 
nor list, for which special handling code makes the most sense. I thought about 
naming it PySequence_Unpack(), but that would imply that it should reject 
unordered (or, for safety, any unknown) iterables and non-sequence iterator as 
input, which IMHO would complicate matters more than it would help. A warning 
about unordered iterables in the documentation should be enough. I would expect 
that most users would actually know the type of sequence that they are 
processing.

The "max_unpack" parameter gives the number of varargs that follows, which are 
all either of type PyObject** or NULL, the latter indicating that the value is 
not of interest. Non-NULL pointers will receive a new reference to the item at 
the corresponding index.

The "min_unpack" parameter is made available for error checking. If less items 
are found in the iterable, the function sets a ValueError and returns -1. 
Assignments may or may not have taken place at this point, but no owned 
references are passed back in this case. If, on successful unpacking, the 
number of unpacked items is smaller than "max_unpack", all remaining item 
pointers will be set to NULL. Users who do not care about the number of items 
would pass 0 and those who know the exact length would pass that as both 
"min_unpack" and "max_unpack".

There is one case I'm not sure about yet, and that's how to handle the case of 
finding more items than "max_unpack" requests. I think it's just as convenient 
in some cases to automatically raise an exception, as it is in other cases to 
just ignore them. I think a way to solve this could be to not raise an 
exception, but to return 0 when all items were processed and 1 when there are 
remaining items. In this case, users who care could check the result and if 
they consider left-over items an error, clean up the returned references and 
raise an error manually. Alternatively, the function could return the number of 
unpacked items, but that may involve more work on the user side in order to 
find out what needs to be done. The drawback of a tristate return with and 
without errors set is that the straight forward "if (PyIter_Unpack(...))" check 
is no longer enough to correctly detect and propagate errors. Also, when 
passing an iterator, the function would have to eat one more value in order 
 to determine the return code. That may not be what the caller wants.

Maybe an additional flag parameter ("check_size") could solve this. If true, 
the function will check the size of sequences and report longer sequences as 
errors, and for iterators, will unpack the next item and report it as error if 
available. If false, additional values will be ignored for sequences and no 
attempt will be made for iterators to unpack more items than requested.


Because of the questions above, and because this addition involves a certain 
redundancy with what's there already (namely the argument and tuple unpacking 
functions which do not work on lists or arbitrary iterables and/or raise the 
wrong exceptions), I'm asking for comments before writing up a patch. Any 
thoughts on this?

----------
components: Interpreter Core
messages: 154217
nosy: scoder
priority: normal
severity: normal
status: open
title: add a convenience C-API function for unpacking iterables
type: enhancement
versions: Python 3.3

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue14121>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to