New submission from Inada Naoki <songofaca...@gmail.com>:
When PGO is not used, compilers don't know which part is hot. So gcc failed to inline hot code in pymalloc_alloc and pymalloc_free into _PyObject_Malloc and _PyObject_Free. For example, only this code is inlined into _PyObject_Malloc. if (nbytes == 0) { return 0; } if (nbytes > SMALL_REQUEST_THRESHOLD) { return 0; } But the hottest part is taking memory block from freelist in the pool. To optimize it, * make pymalloc_alloc and pymalloc_free inline functions * Split code for rare / slow paths out to new functions In PR 14674, pymalloc is now as fast as mimalloc in spectral_norm benchmark. $ ./python bm_spectral_norm.py --compare-to=./python-master python-master: ..................... 199 ms +- 1 ms python: ..................... 176 ms +- 1 ms Mean +- std dev: [python-master] 199 ms +- 1 ms -> [python] 176 ms +- 1 ms: 1.13x faster (-11%) ---------- components: Interpreter Core messages: 347615 nosy: inada.naoki priority: normal severity: normal status: open title: Optimize pymalloc for non PGO build type: performance versions: Python 3.9 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue37543> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com