------- Comment #14 from rguenth at gcc dot gnu dot org  2007-06-10 12:07 
-------
The interesting difference between sqrtss, divss and rcpss, rsqrtss is that
the former have throughput of 1/16 while the latter are 1/1 (latencies compare
21 vs. 3).  This is on K10.  The optimization guide only mentions calculating
the reciprocal y = a/b via rcpss and the square root (!) via rsqrtss
(sqrt a = 0.5 * a * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a)))

So the optimization would be mainly to improve instruction throughput, not
overall latency.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723

Reply via email to