Steven D'Aprano added the comment: > * Display the average, rather than the minimum, of the timings *and* > display the standard deviation. It should help a little bit to get > more reproductible results.
I'm still not convinced that the average is the right statistic to use here. I cannot comment about Victor's perf project, but for timeit, it seems to me that Tim's original warning that the mean is not useful is correct. Fundamentally, the problem with taking an average is that the timing errors are all one sided. If the unknown "true" or "real" time taken by a piece of code is T, then the random error epsilon is always positive: we're measuring T + ε, not T ± ε. If the errors are evenly divided into positive and negative, then on average the mean() or median() of the measurements will tend to cancel the errors, and you get a good estimate of T. But if the errors are all one-sided, then they don't cancel and you are actually estimating T plus some unknown, average error. In that case, min() is the estimate which is closest to T. Unless you know that average error is tiny compared to T, I don't think the average is very useful. Since these are typically micro-benchmarks, the error is often quite large relative to the unknown T. > * Change the default repeat from 3 to 5 to have a better distribution > of timings. It makes the timeit CLI 66% slower (ex: 1 second instead > of 600 ms). That's the price of stable benchmarks :-) I nearly always run with repeat=5, so I agree with this. > * Don't disable the garbage collector anymore! Disabling the GC is not > fair: real applications use it. But that's just adding noise: you're not timing code snippet, you're timing code snippet plus garbage collector. I disagree with this change, although I would accept it if there was an optional flag to control the gc. > * autorange: start with 1 loop instead of 10 for slow benchmarks like > time.sleep(1) That seems reasonable. > * Display large number of loops as power of 10 for readability, ex: > "10^6" instead of "1000000". Also accept "10^6" syntax for the --num > parameter. Shouldn't we use 10**6 or 1e6 rather than bitwise XOR? :-) This is aimed at Python programmers. We expect ** to mean exponentiation, not ^. > * Add support for "ns" unit: nanoseconds (10^-9 second) Seems reasonable. ---------- nosy: +steven.daprano title: Enhance the timeit module: display average +- std dev instead of minimum -> Enhance the timeit module _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue28240> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com