STINNER Victor added the comment: "While I feel your work is great, performance benefit seems very small, compared complexity of this patch."
I have to agree. I spent a lot of times on benhchmarking these tp_fast* changes. While one or two benchmarks are faster, it's not really the case for the others. I also agree with the complexity. In Python 3.6, most FASTCALL changes were internals. For example, using PyObject_CallFunctionObjArgs() now uses FASTCALL internally, without having to modify callers of the API. I tried to only use _PyObject_FastCallDict/Keywords() in a few places where the speedup was significant. The main visible change of Python 3.6 FASTCALL is the new METH_CALL calling convention for C function. Your change modifying print() to use METH_CALL has a significant impact on the telco benchmark, without no drawback. I tested further changes to use METH_FASTCALL in struct and decimal modules, and they optimize telco even more. To continue the optimization work, I guess that using METH_CALL in more cases, using Argument Clinic whenever possible, would have a more concrete and measurable impact on performances, than this big tp_fastcall patch. But I'm not ready to abandon the whole approach yet, so I change the status to Pending. I may come back in one or two months, to check if I didn't miss anything obvious to unlock even more optimizations ;-) ---------- status: open -> pending _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29259> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com