[issue40521] [subinterpreters] Make free lists and unicode caches per-interpreter

STINNER Victor Fri, 05 Jun 2020 10:33:20 -0700


STINNER Victor <[email protected]> added the comment:


pyperformance comparaison between:

* commit dc24b8a2ac32114313bae519db3ccc21fe45c982 (before "Make tuple free list 
per-interpreter" change)
* PR 20645 (dict free lists) which cumulates all free lists changes (already 
commited + the PR)

Extract of the tested patch, new PyInterpreterState members:
--------------------
diff --git a/Include/internal/pycore_interp.h b/Include/internal/pycore_interp.h
index f04ea330d0..b1a25e0ed4 100644
--- a/Include/internal/pycore_interp.h
+++ b/Include/internal/pycore_interp.h
(...)
@@ -157,6 +233,18 @@ struct _is {
     */
     PyLongObject* small_ints[_PY_NSMALLNEGINTS + _PY_NSMALLPOSINTS];
 #endif
+    struct _Py_unicode_state unicode;
+    struct _Py_float_state float_state;
+    /* Using a cache is very effective since typically only a single slice is
+       created and then deleted again. */
+    PySliceObject *slice_cache;
+
+    struct _Py_tuple_state tuple;
+    struct _Py_list_state list;
+    struct _Py_dict_state dict_state;
+    struct _Py_frame_state frame;
+    struct _Py_async_gen_state async_gen;
+    struct _Py_context_state context;
 };
--------------------

Results:
--------------------
$ python3 -m pyperf compare_to 2020-06-04_20-10-master-dc24b8a2ac32.json.gz 
2020-06-04_20-10-master-dc24b8a2ac32-patch-free_lists.json.gz -G 
Slower (10):
- chameleon: 20.1 ms +- 0.4 ms -> 23.1 ms +- 4.0 ms: 1.15x slower (+15%)
- logging_silent: 334 ns +- 51 ns -> 371 ns +- 70 ns: 1.11x slower (+11%)
- spectral_norm: 274 ms +- 37 ms -> 302 ms +- 55 ms: 1.10x slower (+10%)
- logging_format: 22.5 us +- 0.4 us -> 24.5 us +- 2.7 us: 1.09x slower (+9%)
- json_dumps: 26.6 ms +- 4.0 ms -> 28.7 ms +- 5.5 ms: 1.08x slower (+8%)
- sympy_sum: 390 ms +- 3 ms -> 415 ms +- 45 ms: 1.06x slower (+6%)
- float: 217 ms +- 3 ms -> 231 ms +- 30 ms: 1.06x slower (+6%)
- pidigits: 306 ms +- 32 ms -> 323 ms +- 47 ms: 1.06x slower (+6%)
- python_startup_no_site: 8.71 ms +- 0.77 ms -> 8.94 ms +- 0.91 ms: 1.03x 
slower (+3%)
- xml_etree_process: 130 ms +- 1 ms -> 133 ms +- 2 ms: 1.02x slower (+2%)

Faster (9):
- pickle_pure_python: 1.05 ms +- 0.16 ms -> 964 us +- 19 us: 1.09x faster (-9%)
- scimark_sparse_mat_mult: 11.4 ms +- 2.1 ms -> 10.5 ms +- 1.7 ms: 1.09x faster 
(-8%)
- hexiom: 19.5 ms +- 4.1 ms -> 18.0 ms +- 3.0 ms: 1.08x faster (-7%)
- telco: 15.7 ms +- 3.1 ms -> 14.5 ms +- 0.4 ms: 1.08x faster (-7%)
- unpickle: 31.8 us +- 5.7 us -> 29.5 us +- 4.9 us: 1.08x faster (-7%)
- scimark_lu: 292 ms +- 60 ms -> 274 ms +- 34 ms: 1.07x faster (-6%)
- django_template: 123 ms +- 16 ms -> 119 ms +- 2 ms: 1.04x faster (-3%)
- xml_etree_generate: 160 ms +- 4 ms -> 156 ms +- 3 ms: 1.02x faster (-2%)
- xml_etree_iterparse: 178 ms +- 3 ms -> 177 ms +- 2 ms: 1.01x faster (-1%)

Benchmark hidden because not significant (41): (...)
--------------------

If we ignore differences smaller than 5%:
--------------------
$ python3 -m pyperf compare_to 2020-06-04_20-10-master-dc24b8a2ac32.json.gz 
2020-06-04_20-10-master-dc24b8a2ac32-patch-free_lists.json.gz -G --min-speed=5
Slower (8):
- chameleon: 20.1 ms +- 0.4 ms -> 23.1 ms +- 4.0 ms: 1.15x slower (+15%)
- logging_silent: 334 ns +- 51 ns -> 371 ns +- 70 ns: 1.11x slower (+11%)
- spectral_norm: 274 ms +- 37 ms -> 302 ms +- 55 ms: 1.10x slower (+10%)
- logging_format: 22.5 us +- 0.4 us -> 24.5 us +- 2.7 us: 1.09x slower (+9%)
- json_dumps: 26.6 ms +- 4.0 ms -> 28.7 ms +- 5.5 ms: 1.08x slower (+8%)
- sympy_sum: 390 ms +- 3 ms -> 415 ms +- 45 ms: 1.06x slower (+6%)
- float: 217 ms +- 3 ms -> 231 ms +- 30 ms: 1.06x slower (+6%)
- pidigits: 306 ms +- 32 ms -> 323 ms +- 47 ms: 1.06x slower (+6%)

Faster (6):
- pickle_pure_python: 1.05 ms +- 0.16 ms -> 964 us +- 19 us: 1.09x faster (-9%)
- scimark_sparse_mat_mult: 11.4 ms +- 2.1 ms -> 10.5 ms +- 1.7 ms: 1.09x faster 
(-8%)
- hexiom: 19.5 ms +- 4.1 ms -> 18.0 ms +- 3.0 ms: 1.08x faster (-7%)
- telco: 15.7 ms +- 3.1 ms -> 14.5 ms +- 0.4 ms: 1.08x faster (-7%)
- unpickle: 31.8 us +- 5.7 us -> 29.5 us +- 4.9 us: 1.08x faster (-7%)
- scimark_lu: 292 ms +- 60 ms -> 274 ms +- 34 ms: 1.07x faster (-6%)

Benchmark hidden because not significant (46): (...)
--------------------

Honestly, I'm surprised by these results. I don't see how these free lists 
change can make between 6 and 9 benchamrks faster (ex: 1.08x faster for 
telco!?). For me, it sounds like speed.python.org runner has some troubles. You 
can notice it if you look at the 3 last runs at https://speed.python.org/ : 
they are some spikes (in both directions, faster or slower) which are very 
surprising.

Pablo recently upgrade Ubuntu on the benchmark runner server. I don't know if 
it's related.

I plan to recompute all benchmarks run on the benchmark runner server since 
over the last years, pyperf and pyperformance were upgraded multiple times (old 
data were computed with old versions) and the system (Ubuntu) was upgraded 
(again, old data were computed with older Ubiuntu packages).

----------

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue40521>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue40521] [subinterpreters] Make free lists and unicode caches per-interpreter

Reply via email to