Tim Holy <tim.h...@gmail.com> added the comment:
To make sure it's clear, it's not 0.08ns/function call, it's a loop and it comes out to 0.08ns/element. The purpose of quoting that number was to compare to the CPU clock interval, which on my machine is ~0.4ns. Any operation that's less than 1 clock cycle is suspicious, but not automatically wrong because of SIMD (if the compiler generates such instructions for this operation, but I'm not sure how one checks that in Python). On my AVX2 processor, as many as 16 `uint16` values could fit simultaneously, and so you can't entirely rule out times well below one clock cycle (although the need for load, manipulation, store, and increment means that its not plausible to be 1/16th of the clock cycle). Interestingly, increasing `number` does seem to make it consistent, without obvious transitions. I'm curious why the reported times are not "per number"; I find myself making comparisons using list(map(lambda tm : tm / 1000, t.repeat(repeat=nrep, number=1000))) Should the documentation mention that the timing of the core operation should be divided by `number`? However, in the bigger picture of things I suspect this should be closed. I'll let others chime in first, in case they think documentation or other things need to be changed. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue45261> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com