On 2 February 2016 at 05:38, Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote: > > In effect, each measurement you take is made up of two components: > > * the actual time that the code would take if it had exclusive > access to the machine with no other programs running, call it t; > > * and the noise added by the system, call it Δt. > > It is impossible to measure t alone, you always get t+Δt. The aim in timing > is to reduce the effect of the noise as much as possible.
That's a reasonable model in some cases but not all. In particular the "actual time" part is itself not a constant but rather depends on all sorts of things like RAM caching etc. On the CPython side there could be all kinds of optimisations that would kick in at different times to change the "actual time". Likewise there are optimisations at OS and hardware levels that kick in at different times. > The way to do this with timeit is: > > - don't run extraneous code as part of your test (just measure what > you care about, with as little scaffolding around it); > > - run that code snippet as many times as you can bear in a loop, > *but* let timeit manage the loop; don't try doing it by hand; There are many situations where running something in a loop over and over triggers all kinds of optimisations. PyPy's jit warmup is an example but the effect can occur at the hardware level as well. Performance of code run repeatedly in a tight loop can be different to the same code called once or called sporadically in a long-running process. This is especially true of code that operates on a data structure. In real usage the data structure might be accessed rarely but a tight-profiling loop forces the entire structure into the CPU's caches. > - only care about "best of N", where N is as large a number as you > can stand; > > - averages, standard deviation, and other statistics are pretty > much useless here, the only statistic that you care about is > the minimum. This argument makes sense if you adopt the t+Δt model described above but that model it is just a model. The minimum from micro-benchmarking in a tight-loop is not necessarily a more representative statistic than mean, median or whatever else. Standard deviation is actually useful for assessing the variability of time taken which can also be important and gives additional information about performance: I would like to minimise both the expected time taken and also the variability in that simultaneously if possible. You seem to be describing timeit's way of profiling which is reasonable but it's not the only way. -- Oscar -- https://mail.python.org/mailman/listinfo/python-list