https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #32 from Chris Elrod <elrodc at gmail dot com> --- (In reply to Marc Glisse from comment #31) > (In reply to Chris Elrod from comment #30) > > gcc caclulates the rsqrt directly > > No, vrsqrt14ps is just the first step in calculating sqrt here (slightly > different formula than rsqrt). vrcp14ps shows that it is computing an > inverse later. What we need to understand is why gcc doesn't try to generate > rsqrt (which would also have vrsqrt14ps, but a slightly different formula > without the comparison with 0 and masking, and without needing an inversion > afterwards). Okay, I think I follow you. You're saying instead of doing this (from rguenther), which we want (also without the comparison to 0 and masking, as you note): /* rsqrt(a) = -0.5 * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0) */ it is doing this, which also uses the rsqrt instruction: /* sqrt(a) = -0.5 * a * rsqrtss(a) * (a * rsqrtss(a) * rsqrtss(a) - 3.0) */ and then calculating an inverse approximation of that? The approximate sqrt, and then approximate reciprocal approximations were slower on my computer than just vsqrt followed by div.