On 13 December 2011 21:39, William Stein <wst...@gmail.com> wrote:
> On Tue, Dec 13, 2011 at 1:15 PM, Nils Bruin <nbr...@sfu.ca> wrote:
>> I recall reading something about that in the Python documentation and
>> indeed, quoting from
>>
>> http://docs.python.org/library/timeit.html
>>
>> we find:
>>
>> """
>> Note
>>
>> It’s tempting to calculate mean and standard deviation from the result
>> vector and report these. However, this is not very useful. In a
>> typical case, the lowest value gives a lower bound for how fast your
>> machine can run the given code snippet; higher values in the result
>> vector are typically not caused by variability in Python’s speed, but
>> by other processes interfering with your timing accuracy. So the min()
>> of the result is probably the only number you should be interested in.
>> After that, you should look at the entire vector and apply common
>> sense rather than statistics.
>> """
>
> I now remember that too.  However, I take that as the sort of typical
> thing an engineer who doesn't really understand statistics might say.
> They are concerned about outliers and the data not being normally
> distributed.
>
> The fact is that in practice the bound got from "the lowest value
> gives a lower bound" itself various by quite a bit between calls to
> timeit.   Should one just keep taking minimums?
>
> Given that processors are not deterministic and do speculative
> execution of instructions, etc., I'm even more dubious about the above
> quote.
>
> I've thrown Bill Hart in the cc, since he must have worried a lot
> about exactly this question when trying to make low level C/assembly
> code fast.
>
>  -- William

For low level assembly language we sometimes compute the exact number
of cycles using the cycle counter rather than do a timing. This varies
per architecture and assumes cache affects are not relevant.

For C we (used to) take many iterations and compute minimum and
maximum times. If the two are close and the number of iterations is
high and the machine is not under load, your problem is solved. If any
of those conditions is not met (and sometimes if you do) then you may
not know as much as you believe you do. Things like processors in
power saving mode or variations in the speed of processors on your
cluster may cause massive variations or even just meaningless timings.

If you are timing Python then your timing may be wildly affected by
the choice of language. :-)

Bill.

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Reply via email to