Raymond Hettinger <raymond.hettin...@gmail.com> added the comment:

Here's a little more performance data that might suggest where possible speed 
optimizations may lay (I was mostly going for accuracy improvements in this 
patch).

On my 2.6GHz (3.6Ghz burst) Haswell, the hypot() function for n arguments takes 
about 11*n+60 ns per call.

The 60 ns fixed portion goes to function call overhead, manipulating native 
Python objects scattered all over memory, Inf/NaN handling, and in the external 
calls to __PyArg_ParseStack(), PyObject_Malloc(), PyFloat_AsDouble(), 
PyObject_Free(), and PyFloat_FromDouble().

The inlined summation routine accesses native C doubles in consecutive memory 
addresses.  Per Agner Fog's instruction timing tables, the DIVSD takes 10-13 
cycles which is about 3 ns, the MULSD takes 5 cycles which is about 2ns, and 
ADDSD/SUBSD each have a 3 cycle latency for another 1 ns each.  That accounts 
for most of the 11 ns per argument variable portion of the running time.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue34376>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to