[issue45261] Unreliable (?) results from timeit (cache issue?)

STINNER Victor Wed, 22 Sep 2021 03:26:41 -0700


STINNER Victor <[email protected]> added the comment:

PyPy emits a warning when the timeit module is used, suggesting to use pyperf.

timeit uses the minimum, whereas pyperf uses the average (arithmetic mean).

timeit uses a single process, pyperf spawns 21 processes: 1 just for the loop
calibration, 20 to compute values.

timeit computes 5 values, pyperf computes 60 values.

timeit uses all computed values, pyperf ignores the first value considered as a
"warmup value" (the number of warmup values can be configured).

timeit doesn't compute the standard deviation, pyperf does. The standard
deviation gives an idea if the benchmark looks reliable or not. IMO results
without standard deviation should not be trusted.

pyperf also emits warning when a benchmark doesn't look reliable. For example,
if the user ran various workload while the benchmark was running.

pyperf also supports storing results in a JSON file which stores all values,
but also metadata.

I cannot force people to stop using timeit. But there are reason why pyperf is
more reliable than timeit.

Benchmarking is hard. See pyperf documentation giving hints how to get
reproducible benchmark results:
https://pyperf.readthedocs.io/en/latest/run_benchmark.html#how-to-get-reproducible-benchmark-results

Read also this important article ;-)
"Biased Benchmarks (honesty is hard)"
http://matthewrocklin.com/blog/work/2017/03/09/biased-benchmarks

----------

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue45261>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue45261] Unreliable (?) results from timeit (cache issue?)

Reply via email to