STINNER Victor added the comment: Changes of my current implementation, ad4a53ed1fbf.diff.
The good thing is that all changes are internals (really?). Even if you don't modify your C extensions (nor your Python code), you should benefit of the new fast call is *a lot* of cases. IMHO the best tricky part are changes on the PyTypeObject. Is it ok to add a new tp_fastcall slot? Should we add even more slots using the fast call convention like tp_fastnew and tp_fastinit? How should we handle the inheritance of types with that? (*) Add 2 new public functions: PyObject* PyObject_CallNoArg(PyObject *func); PyObject* PyObject_CallArg1(PyObject *func, PyObject *arg); (*) Add 1 new private function: PyObject* _PyObject_FastCall(PyObject *func, PyObject **stack, int na, int nk); _PyObject_FastCall() is the root of the new feature. (*) type: add a new "tp_fastcall" field to the PyTypeObject structure. It's unclear to me how inheritance is handled here. Maybe it's simply broken, but it's strange because it looks like it works :-) Maybe it's very rare that tp_call is overidden in a child class? TODO: maybe reuse the "tp_call" field? (risk of major backward incompatibility...) (*) slots: add a new "fastwrapper" field to the wrappercase structure. Add a fast wrapper to all slots (really all? i should check). I don't think that consumers of the C API are of this change, or maybe only a few projects. TODO: maybe remove "fastwrapper" and reuse the "wrapper" field? (low risk of backward compatibility?) (*) Implement fast call for Python function (_PyFunction_FastCall) and C functions (PyCFunction_FastCall) (*) Add a new METH_FASTCALL calling convention for C functions. Right now, it is used for 4 builtin functions: sorted(), getattr(), iter(), next(). Argument Clinic should be modified to emit C code using this new fast calling convention. (*) Implement fast call in the following functions (types): - method() - method_descriptor() - wrapper_descriptor() - method_wrapper() - operator.itemgetter => used by collections.namedtuple to get an item by its name (*) Modify PyObject_Call*() functins to reuse internally the fast call. "tp_fastcall" is preferred over "tp_call" (FIXME: is it really useful to do that?). The following functions are able to avoid temporary tuple/dict without having to modify the code calling them: - PyObject_CallFunction() - PyObject_CallMethod(), _PyObject_CallMethodId() - PyObject_CallFunctionObjArgs(), PyObject_CallMethodObjArgs() It's not required to modify code using these functions to use the 3 new shiny functions (PyObject_CallNoArg, PyObject_CallArg1, _PyObject_FastCall). For example, replacing PyObject_CallFunctionObjArgs(func, NULL) with PyObject_CallNoArg(func) is just a micro-optimization, the tuple is already avoided. But PyObject_CallNoArg() should use less memory of the C stack and be a "little bit" faster. (*) Add new helpers: new Include/pystack.h file, Py_VaBuildStack(), etc. Please ignore unrelated changes. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26814> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com