New submission from STINNER Victor <vstin...@python.org>:

The C API of Python uses and abuses borrowed references and stealing references 
for performance. When such function is used in some very specific code for best 
performances, problems arise when they are the only way to access objects. 
Reference counting in C is error prone, most people, even experimented core 
developers, get it wrong. Examples of issues:

* Reference leaks: objects are never deleted causing memory leaks. For example, 
an error handling code which forgets to call Py_DECREF() on a newly create 
object.

* Unsafe borrowed references: call arbitrary Python code can delete the 
referenced objects, and so the borrowed reference becomes a dangling pointer. 
Most developers are confident that a function call cannot run arbitrary Python 
code, whereas a single Py_DECREF() can trigger a GC collection which runs 
finalizers which can be arbitrary Python code. Many functions have been fixed 
manually by adding Py_INCREF() and Py_DECREF() around "unsafe" function calls.


Borrowed references and stealing references make reference counting code 
special, even more complex to review. I propose to use new function to make 
refecence counting code more regular, simpler to review, and so less error 
prone.

Examples:

* Add PyTuple_GetItem(): similar to PyTuple_GetItem() but returns a strong 
reference (or NULL if the tuple item is not set)
* Add PyTuple_SetItemRef(): similar to PyTuple_SetItem() but don't steal a 
reference to the new item

The C API has a long list of functions using borrowed references, so I'm not 
sure where we should stop. I propose to start with the most common functions: 
PyDict, PyTuple, PyList, and see how it goes.

--

PyTuple_GetItem() is a function call which checks arguments: raise an exception 
if arguments are invalid. For best performances, PyTuple_GET_ITEM() macro is 
providing to skip these checks. This macro also returns a borrowed reference.

I'm not if a new PyTuple_GET_ITEM_REF() macro should be added: similar to 
PyTuple_GET_ITEM() but returns a strong reference.

Same open question abut PyTuple_SET_ITEM(tuple, index, item) macro which is 
also special:

* Don't call Py_XINCREF(item)
* Don't call Py_XDECREF() on the old item

If a new PyTuple_SET_ITEM_REF() macro is added, I would prefer to make the 
function more "regular" in term of reference counting, and so call Py_XDECREF() 
on the old item. When used on a newly created tuple, it would add an useless 
Py_XDECREF(NULL), compared to PyTuple_SET_ITEM(). Again, my idea here is to 
provide functions with a less surprising behavior and more regular reference 
counting. There are alternatives to build a new tuple without the useless 
Py_XDECREF(NULL), like Py_BuildValue().

Code which requires best performance could continue to use PyTuple_SET_ITEM() 
which is not deprecated, and handle reference counting manually.

--

An alternative is to use abstract functions like:

* PyTuple_GetItem() => PySequence_GetItem()
* PyDict_GetItem() => PyObject_GetItem()
* etc.

I propose to keep specialized functions per type to avoid the overhead of 
indirection. For example, PySequence_GetItem(obj, index) calls 
Py_TYPE(obj)->tp_as_sequence->sq_item(obj, index) which implies multiple 
indirection:

* Get the object type from PyObject.ob_type
* Dereference *type to get PyTypeObject.tp_as_sequence
* Dereference *PyTypeObject.tp_as_sequence to get PySequenceMethods.sq_item

--

I don't plan to get rid of borrowed references. Sometimes, they are safe and 
replacing them with strong references would require explicit reference counting 
code which is again easy to get wrong.

For example, Py_TYPE() returns a borrowed reference to an object type. The 
function is commonly used to access immediately to a type member, with no risk 
of calling arbitrary Python code between the Py_TYPE() call and the read of the 
type attribute. For example, the following code is perfectly safe:

        PyErr_Format(PyExc_TypeError, "exec() globals must be a dict, not 
%.100s",
                     Py_TYPE(globals)->tp_name);


--

See also bpo-42262 where I added Py_NewRef() and Py_XNewRef() functions.

See https://pythoncapi.readthedocs.io/bad_api.html#borrowed-references for 
details about issues caused by borrowed references and a list of functions 
using borrowed references.

----------
components: C API
messages: 380578
nosy: vstinner
priority: normal
severity: normal
status: open
title: [C API] Add new C functions with more regular reference counting like 
PyTuple_GetItemRef()
versions: Python 3.10

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue42294>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to