STINNER Victor <vstin...@python.org> added the comment:
> But timing results are not like that, the measurement errors are one-sided, not two: (..) I suggest you to run tons of benchmarks and look at the distribution. The reality is more complex than what you may think. > measurement = true value + random error In my experience, there is no single "true value", they are multiple values. Concrete example where the Python randomized hash function gives different value each time you spawn a new Python process: https://vstinner.github.io/journey-to-stable-benchmark-average.html Each process has its own "true value", but pyperf spawns 20 Python processes :-) There are multiple sources of randomness, not only the Python randomized hash function. On Linux, the process address space is randomized by ASLR. I may give different timing at each run. Code placement, exact memory address, etc. Many things enter into the game when you look into functions which take less than 100 ns. Here is report is about a value lower than a single nanosecond: "0.08ns/element". -- I wrote articles about benchmarking: https://vstinner.readthedocs.io/benchmark.html#my-articles I gave a talk about it: * https://raw.githubusercontent.com/vstinner/talks/main/2017-FOSDEM-Brussels/howto_run_stable_benchmarks.pdf * https://archive.fosdem.org/2017/schedule/event/python_stable_benchmark/ Again, good luck with benchmarking, it's a hard problem ;-) -- Once you will consider that you know everything about benchmarking, you should read the following paper and cry: https://arxiv.org/abs/1602.00602 See also my analysis of PyPy performance: https://vstinner.readthedocs.io/pypy_warmups.html ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue45261> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com