New submission from Eric Snow <ericsnowcurren...@gmail.com>:
`_Py_Identifier` has been useful but at this point there is a faster and simpler approach we could take as a replacement: statically initialize the objects as fields on `_PyRuntimeState` and reference them directly through a macro. This would involve the following: * add a `PyUnicodeObject field (not a pointer) to `_PyRuntimeState` for each string that currently uses `_Py_IDENTIFIER()` * initialize each object as part of the static initializer for `_PyRuntimeState` * make each "immortal" (e.g. start with a really high refcount) * add a macro to look up a given string * update each location that currently uses `_Py_IDENTIFIER()` to use the new macro instead As part of this, we would also do the following: * get rid of all C-API functions with `_Py_Identifer` parameters * get rid of the old runtime state related to identifiers * get rid of `_Py_Identifier`, `_Py_IDENTIFIER()`, etc. (Note that there are several hundred uses of `_Py_IDENTIFIER()`, including a number of duplicates.) Pros: * reduces indirection (and extra calls) for C-API using the strings (making the code easier to understand and speeding it up) * the objects are referenced from a fixed address in the static data section (speeding things up and allowing the C compiler to optimize better) * there is no lazy allocation (or lookup, etc.) so there are fewer possible failures when the objects get used (thus less error return checking) * simplifies the runtime state * saves memory (at little, at least) * the approach for per-interpreter is simpler (if needed) * reduces the number of static variables in any given C module * reduces the number of functions in the ("private") C-API * "deep frozen" modules can use these strings * other commonly-used strings could be pre-allocated by adding `_PyRuntimeState` fields for them Cons: * churn * adding a string to the list requires modifying a separate file from the one where you actually want to use the string * strings can get "orphaned" (we could prevent this with a check in `make check`) * some PyPI packages may rely on `_Py_IDENTIFIER()` (even though it is "private" C-API) * some strings may never get used for any given ./python invocation Note that with a basic partial implementation (GH-30928) I'm seeing a 1% improvement in performance (see https://github.com/faster-cpython/ideas/issues/230). ---------- assignee: eric.snow components: Interpreter Core messages: 411799 nosy: eric.snow, serhiy.storchaka, vstinner priority: normal pull_requests: 29107 severity: normal stage: needs patch status: open title: Replace _Py_IDENTIFIER() with statically initialized objects. versions: Python 3.11 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue46541> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com