On 13 December 2011 21:39, William Stein <wst...@gmail.com> wrote: > On Tue, Dec 13, 2011 at 1:15 PM, Nils Bruin <nbr...@sfu.ca> wrote: >> I recall reading something about that in the Python documentation and >> indeed, quoting from >> >> http://docs.python.org/library/timeit.html >> >> we find: >> >> """ >> Note >> >> It’s tempting to calculate mean and standard deviation from the result >> vector and report these. However, this is not very useful. In a >> typical case, the lowest value gives a lower bound for how fast your >> machine can run the given code snippet; higher values in the result >> vector are typically not caused by variability in Python’s speed, but >> by other processes interfering with your timing accuracy. So the min() >> of the result is probably the only number you should be interested in. >> After that, you should look at the entire vector and apply common >> sense rather than statistics. >> """ > > I now remember that too. However, I take that as the sort of typical > thing an engineer who doesn't really understand statistics might say. > They are concerned about outliers and the data not being normally > distributed. > > The fact is that in practice the bound got from "the lowest value > gives a lower bound" itself various by quite a bit between calls to > timeit. Should one just keep taking minimums? > > Given that processors are not deterministic and do speculative > execution of instructions, etc., I'm even more dubious about the above > quote. > > I've thrown Bill Hart in the cc, since he must have worried a lot > about exactly this question when trying to make low level C/assembly > code fast. > > -- William
For low level assembly language we sometimes compute the exact number of cycles using the cycle counter rather than do a timing. This varies per architecture and assumes cache affects are not relevant. For C we (used to) take many iterations and compute minimum and maximum times. If the two are close and the number of iterations is high and the machine is not under load, your problem is solved. If any of those conditions is not met (and sometimes if you do) then you may not know as much as you believe you do. Things like processors in power saving mode or variations in the speed of processors on your cluster may cause massive variations or even just meaningless timings. If you are timing Python then your timing may be wildly affected by the choice of language. :-) Bill. -- To post to this group, send an email to sage-devel@googlegroups.com To unsubscribe from this group, send an email to sage-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sage-devel URL: http://www.sagemath.org