New submission from Brandt Bucher <brandtbuc...@gmail.com>:
The attached PR adds a fast path for BINARY_ADD instructions involving two lists, where the left list has a refcount of exactly 1. In this case, we instead do a PySequence_InPlaceConcat operation. This has the affect of avoiding quadratic complexity for list summations, by keeping only one intermediate result and extending it in-place. For (potentially large) lists: a + b + c + d ... Currently executed as: tmp = a + b tmp = tmp + c tmp = tmp + d ... With this change: tmp = a + b tmp += c tmp += d ... Here are the pyperformance results (optimizations enabled): Slower (6): - pickle_list: 3.63 us +- 0.08 us -> 3.78 us +- 0.10 us: 1.04x slower (+4%) - xml_etree_generate: 102 ms +- 2 ms -> 104 ms +- 3 ms: 1.02x slower (+2%) - xml_etree_parse: 166 ms +- 4 ms -> 169 ms +- 9 ms: 1.02x slower (+2%) - scimark_sparse_mat_mult: 5.42 ms +- 0.11 ms -> 5.47 ms +- 0.16 ms: 1.01x slower (+1%) - pickle_dict: 23.6 us +- 0.8 us -> 23.8 us +- 0.4 us: 1.01x slower (+1%) - scimark_fft: 413 ms +- 6 ms -> 416 ms +- 6 ms: 1.01x slower (+1%) Faster (25): - 2to3: 394 ms +- 18 ms -> 381 ms +- 8 ms: 1.03x faster (-3%) - python_startup_no_site: 12.0 ms +- 0.2 ms -> 11.7 ms +- 0.2 ms: 1.03x faster (-3%) - nqueens: 121 ms +- 2 ms -> 118 ms +- 3 ms: 1.03x faster (-2%) - pathlib: 36.0 ms +- 1.3 ms -> 35.1 ms +- 0.9 ms: 1.02x faster (-2%) - pickle_pure_python: 523 us +- 15 us -> 511 us +- 13 us: 1.02x faster (-2%) - deltablue: 8.54 ms +- 0.25 ms -> 8.35 ms +- 0.27 ms: 1.02x faster (-2%) - logging_silent: 220 ns +- 6 ns -> 215 ns +- 8 ns: 1.02x faster (-2%) - raytrace: 566 ms +- 8 ms -> 554 ms +- 11 ms: 1.02x faster (-2%) - hexiom: 11.2 ms +- 0.3 ms -> 11.0 ms +- 0.2 ms: 1.02x faster (-2%) - go: 294 ms +- 6 ms -> 288 ms +- 5 ms: 1.02x faster (-2%) - python_startup: 15.7 ms +- 0.4 ms -> 15.4 ms +- 0.5 ms: 1.02x faster (-2%) - float: 136 ms +- 4 ms -> 134 ms +- 6 ms: 1.02x faster (-2%) - scimark_sor: 232 ms +- 8 ms -> 228 ms +- 5 ms: 1.02x faster (-2%) - fannkuch: 554 ms +- 8 ms -> 545 ms +- 9 ms: 1.02x faster (-2%) - meteor_contest: 115 ms +- 2 ms -> 113 ms +- 3 ms: 1.02x faster (-2%) - telco: 6.58 ms +- 0.22 ms -> 6.48 ms +- 0.15 ms: 1.02x faster (-2%) - regex_effbot: 3.76 ms +- 0.08 ms -> 3.70 ms +- 0.09 ms: 1.02x faster (-2%) - chaos: 129 ms +- 3 ms -> 127 ms +- 2 ms: 1.01x faster (-1%) - regex_v8: 26.3 ms +- 0.4 ms -> 25.9 ms +- 0.6 ms: 1.01x faster (-1%) - regex_compile: 200 ms +- 5 ms -> 197 ms +- 5 ms: 1.01x faster (-1%) - logging_simple: 9.78 us +- 0.19 us -> 9.65 us +- 0.26 us: 1.01x faster (-1%) - sqlite_synth: 3.24 us +- 0.10 us -> 3.20 us +- 0.06 us: 1.01x faster (-1%) - unpickle: 16.4 us +- 0.5 us -> 16.2 us +- 0.5 us: 1.01x faster (-1%) - logging_format: 10.8 us +- 0.2 us -> 10.7 us +- 0.3 us: 1.01x faster (-1%) - json_loads: 30.2 us +- 0.8 us -> 29.9 us +- 0.6 us: 1.01x faster (-1%) Benchmark hidden because not significant (14): json_dumps, nbody, pickle, pidigits, regex_dna, richards, scimark_lu, scimark_monte_carlo, spectral_norm, unpack_sequence, unpickle_list, unpickle_pure_python, xml_etree_iterparse, xml_etree_process This is related to my earlier work in bpo-36229, however this is a much less invasive change that's limited to ceval.c, where our knowledge of context is much better. ---------- components: Interpreter Core messages: 354399 nosy: brandtbucher priority: normal severity: normal status: open title: Improved performance for list addition. type: performance versions: Python 3.9 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue38436> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com