New submission from Inada Naoki <songofaca...@gmail.com>:
LOAD_METHOD avoids temporary bound method object. PyObject_CallMethodObjArgs now use same optimization. Now I think there is not enough performance benefit from free_list. When free_list is not used often enough, it may bother obmalloc reuse memory pool. This is performance diff of removing free_list (with LTO, without PGO, patched=removed free_list): ``` $ ./python -m pyperf compare_to master.json patched.json -G --min-speed=1 Slower (19): - sqlite_synth: 4.03 us +- 0.10 us -> 4.20 us +- 0.08 us: 1.04x slower (+4%) - genshi_text: 41.2 ms +- 0.4 ms -> 42.6 ms +- 0.4 ms: 1.03x slower (+3%) - scimark_sparse_mat_mult: 6.29 ms +- 0.03 ms -> 6.50 ms +- 0.50 ms: 1.03x slower (+3%) - mako: 26.5 ms +- 0.1 ms -> 27.4 ms +- 0.3 ms: 1.03x slower (+3%) - html5lib: 130 ms +- 4 ms -> 134 ms +- 5 ms: 1.03x slower (+3%) - genshi_xml: 83.4 ms +- 1.1 ms -> 85.6 ms +- 1.2 ms: 1.03x slower (+3%) - pickle: 15.1 us +- 0.5 us -> 15.5 us +- 0.5 us: 1.03x slower (+3%) - float: 161 ms +- 1 ms -> 165 ms +- 1 ms: 1.02x slower (+2%) - logging_simple: 13.9 us +- 0.2 us -> 14.2 us +- 0.2 us: 1.02x slower (+2%) - xml_etree_process: 108 ms +- 1 ms -> 110 ms +- 1 ms: 1.02x slower (+2%) - pathlib: 28.0 ms +- 0.2 ms -> 28.5 ms +- 0.3 ms: 1.02x slower (+2%) - pickle_pure_python: 703 us +- 8 us -> 715 us +- 7 us: 1.02x slower (+2%) - sympy_expand: 553 ms +- 5 ms -> 563 ms +- 12 ms: 1.02x slower (+2%) - xml_etree_generate: 136 ms +- 2 ms -> 138 ms +- 1 ms: 1.02x slower (+2%) - logging_format: 15.3 us +- 0.2 us -> 15.5 us +- 0.2 us: 1.01x slower (+1%) - json_dumps: 17.4 ms +- 0.1 ms -> 17.7 ms +- 0.2 ms: 1.01x slower (+1%) - logging_silent: 266 ns +- 5 ns -> 269 ns +- 9 ns: 1.01x slower (+1%) - django_template: 163 ms +- 1 ms -> 165 ms +- 2 ms: 1.01x slower (+1%) - sympy_sum: 219 ms +- 2 ms -> 222 ms +- 2 ms: 1.01x slower (+1%) Faster (6): - regex_effbot: 4.51 ms +- 0.04 ms -> 4.44 ms +- 0.03 ms: 1.02x faster (-2%) - pickle_list: 5.21 us +- 0.04 us -> 5.13 us +- 0.04 us: 1.01x faster (-1%) - crypto_pyaes: 164 ms +- 1 ms -> 162 ms +- 1 ms: 1.01x faster (-1%) - xml_etree_parse: 202 ms +- 7 ms -> 200 ms +- 3 ms: 1.01x faster (-1%) - scimark_sor: 287 ms +- 6 ms -> 284 ms +- 6 ms: 1.01x faster (-1%) - raytrace: 758 ms +- 26 ms -> 750 ms +- 11 ms: 1.01x faster (-1%) Benchmark hidden because not significant (35) ``` I think free_list is useful only when several benchmarks in pyperformance shows more than 5% speedup. The benefit is smaller than my threshold. I will run pyperformance again after bpo-37337 is merged. FWIW, In case of sqlite_synth, I think performance difference came from here: https://github.com/python/cpython/blob/015000165373f8db263ef5bc682f02d74e5782ac/Modules/_sqlite/connection.c#L662 If performance of user-defined aggregate feature is really important, we can optimize it further. ---------- components: Interpreter Core messages: 346040 nosy: inada.naoki, jdemeyer priority: normal severity: normal status: open title: remove free_list for bound method objects type: performance versions: Python 3.9 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue37340> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com