Raymond Hettinger <raymond.hettin...@gmail.com> added the comment:
Here's a little more performance data that might suggest where possible speed optimizations may lay (I was mostly going for accuracy improvements in this patch). On my 2.6GHz (3.6Ghz burst) Haswell, the hypot() function for n arguments takes about 11*n+60 ns per call. The 60 ns fixed portion goes to function call overhead, manipulating native Python objects scattered all over memory, Inf/NaN handling, and in the external calls to __PyArg_ParseStack(), PyObject_Malloc(), PyFloat_AsDouble(), PyObject_Free(), and PyFloat_FromDouble(). The inlined summation routine accesses native C doubles in consecutive memory addresses. Per Agner Fog's instruction timing tables, the DIVSD takes 10-13 cycles which is about 3 ns, the MULSD takes 5 cycles which is about 2ns, and ADDSD/SUBSD each have a 3 cycle latency for another 1 ns each. That accounts for most of the 11 ns per argument variable portion of the running time. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue34376> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com