https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105101
Michael_S <already5chosen at yahoo dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |already5chosen at yahoo dot com --- Comment #4 from Michael_S <already5chosen at yahoo dot com> --- If you want quick fix for immediate shipment then you can take that: #include <math.h> #include <quadmath.h> __float128 quick_and_dirty_sqrtq(__float128 x) { if (isnanq(x)) return x; if (x==0) return x; if (x < 0) return nanq(""); if (isinfq(x)) return x; int xExp; x = frexpq(x, &xExp); if (xExp & 1) x *= 2.0; // x in [0.5:2.0) __float128 r = (__float128)(1.0/sqrt((double)x)); // r=rsqrt(x) estimate (53 bits) r *= 1.5 - r*r*x*0.5; // NR iteration improves precision of r to 105.4 bit __float128 y = x*r; // y=sqrt(x) estimate (105.4 bits) // extended-precision NR iteration __float128 yH = (double)y; __float128 yL = y - yH; __float128 deltaX = x - yH*yH; deltaX -= yH*yL*2; deltaX -= yL*yL; y += deltaX*r*0.5; // improve precision of y to ~210.2 bits. Not enough for perfect rounding, but not too bad return ldexpq(y, xExp >> 1); } It is very slow, even slower than what you have now, which by itself is quite astonishingly slow. It is also not sufficiently precise for correct rounding in all cases. But, at least, the worst error is something like (0.5+2**-98) ULP, so you are unlikely to be ever caught by black box type of testing. It's biggest advantage is extreme portability. Should run on all platforms where double==IEEE binary64 and __float128 == IEEE binary128. May be, few days later I'll have better variant for "good" 64-bit platforms i.e. for those where we have __int128. It would be 15-25 times faster than the variant above and rounding would be mathematically correct rather than just "impossible to be caught" like above. But it would not run everywhere. Also, I want to give it away under MIT or BSD license, rather than under GPL.