Steven D'Aprano <steve+pyt...@pearwood.info> added the comment:
Thank you Victor, I don't know whether I can convince you or you will convince me, but this is very interesting one way or the other. And I hope that Tim Holy is interested too :-) On Wed, Sep 22, 2021 at 01:51:35PM +0000, STINNER Victor wrote: > > But timing results are not like that, the measurement errors are > > one-sided, not two: (..) > > I suggest you to run tons of benchmarks and look at the distribution. > The reality is more complex than what you may think. We are trying to find the ideal time that the code would take with no external factors slowing it. No noise, no other processes running, no operating system overhead. The baseline time that it would take to run the code if the CPU could dedicate every cycle to your code and nothing else. (Obviously we cannot measure that time on a real computer where there is always other processes running, but we want the best estimate of that time we can reach.) There may still be conditions that can shift the baseline, e.g. the data fits in the CPU cache or it doesn't, but that's not *environmental* noise except so far as the impact of other processes might push your data out of the cache. So in a noisy system (a real computer) there is some ideal time that the code would take to run if the CPU was dedicated to running your code only, and some actual time (ideal time + environmental noise) that we can measure. The enviromental noise is never negative. That would mean that running more processes on the machine could make it faster. That cannot be. > > measurement = true value + random error > > In my experience, there is no single "true value", they are multiple > values. Concrete example where the Python randomized hash function > gives different value each time you spawn a new Python process: > > https://vstinner.github.io/journey-to-stable-benchmark-average.html > > Each process has its own "true value", but pyperf spawns 20 Python > processes :-) Thank you for the link, I have read it. Of course the more processes you spawn, the more environmental noise there is, which means more noise in your measurements. If you run with 10 processes, or 100 processes, you probably won't get the same results for the two runs. (Sorry, I have not tried this myself. Consider this a prediction.) You say: "But how can we compare performances if results are random? Take the minimum? No! You must never (ever again) use the minimum for benchmarking! Compute the average and some statistics like the standard deviation" but you give no reason for why you should not use the minimum. Without a reason why the minimum is *wrong*, I cannot accept that prohibition. > Here is report is about a value lower than a single nanosecond: > "0.08ns/element". An Intel i7 CPU performs at approximately 220000 MIPS, so you can execute about 17 or 18 CPU instructions in 0.08 ns. (Depends on the instruction, of course.) AMD Ryzen Threadripper may be ten times as fast. Without knowing the CPU and code I cannot tell whether to be impressed or shocked at the time. > Again, good luck with benchmarking, it's a hard problem ;-) Thank you, and I do appreciate all the work you have put into it, even if you have not convinced me to stop using the minimum. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue45261> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com