Tim Holy <tim.h...@gmail.com> added the comment:

To make sure it's clear, it's not 0.08ns/function call, it's a loop and it 
comes out to 0.08ns/element. The purpose of quoting that number was to compare 
to the CPU clock interval, which on my machine is ~0.4ns. Any operation that's 
less than 1 clock cycle is suspicious, but not automatically wrong because of 
SIMD (if the compiler generates such instructions for this operation, but I'm 
not sure how one checks that in Python). On my AVX2 processor, as many as 16 
`uint16` values could fit simultaneously, and so you can't entirely rule out 
times well below one clock cycle (although the need for load, manipulation, 
store, and increment means that its not plausible to be 1/16th of the clock 
cycle).

Interestingly, increasing `number` does seem to make it consistent, without 
obvious transitions. I'm curious why the reported times are not "per number"; I 
find myself making comparisons using

    list(map(lambda tm : tm / 1000, t.repeat(repeat=nrep, number=1000)))

Should the documentation mention that the timing of the core operation should 
be divided by `number`?

However, in the bigger picture of things I suspect this should be closed. I'll 
let others chime in first, in case they think documentation or other things 
need to be changed.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue45261>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to