STINNER Victor <vstin...@python.org> added the comment:

PyPy emits a warning when the timeit module is used, suggesting to use pyperf.

timeit uses the minimum, whereas pyperf uses the average (arithmetic mean).

timeit uses a single process, pyperf spawns 21 processes: 1 just for the loop 
calibration, 20 to compute values.

timeit computes 5 values, pyperf computes 60 values.

timeit uses all computed values, pyperf ignores the first value considered as a 
"warmup value" (the number of warmup values can be configured).

timeit doesn't compute the standard deviation, pyperf does. The standard 
deviation gives an idea if the benchmark looks reliable or not. IMO results 
without standard deviation should not be trusted.

pyperf also emits warning when a benchmark doesn't look reliable. For example, 
if the user ran various workload while the benchmark was running.

pyperf also supports storing results in a JSON file which stores all values, 
but also metadata.

I cannot force people to stop using timeit. But there are reason why pyperf is 
more reliable than timeit.

Benchmarking is hard. See pyperf documentation giving hints how to get 
reproducible benchmark results:
https://pyperf.readthedocs.io/en/latest/run_benchmark.html#how-to-get-reproducible-benchmark-results

Read also this important article ;-)
"Biased Benchmarks (honesty is hard)"
http://matthewrocklin.com/blog/work/2017/03/09/biased-benchmarks

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue45261>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to