https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105101
--- Comment #18 from joseph at codesourcery dot com <joseph at codesourcery dot com> --- libquadmath is essentially legacy code. People working directly in C should be using the C23 _Float128 interfaces and *f128 functions, as in current glibc, rather than libquadmath interfaces (unless their code needs to support old glibc or non-glibc C libraries that don't support _Float128 in C23 Annex H). It would be desirable to make GCC generate *f128 calls when appropriate from Fortran code using this format as well; see <https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578937.html> for more discussion of the different cases involved. Most of libquadmath is derived from code in glibc - some of it can now be updated from the glibc code automatically (see update-quadmath.py), other parts can't (although it would certainly be desirable to extend update-quadmath.py to cover that other code as well). See the commit message for commit 4239f144ce50c94f2c6cc232028f167b6ebfd506 for a more detailed discussion of what code comes from glibc and what is / is not automatically handled by update-quadmath.py. Since update-quadmath.py hasn't been run for a while, it might need changes to work with more recent changes to the glibc code. sqrtq.c is one of the files not based on glibc code. That's probably because glibc didn't have a convenient generic implementation of binary128 sqrt to use when libquadmath was added - it has soft-fp implementations used for various architectures, but those require sfp-machine.h for each architecture (which maybe we do in fact have in libgcc for each relevant architecture, but it's an extra complication). Certainly making it possible to use code from glibc for binary128 sqrt would be a good idea, but while we aren't doing that, it should also be OK to improve sqrtq locally in libquadmath. The glibc functions for this format are generally *not* optimized for speed yet (this includes the soft-fp-based versions of sqrt). Note that what's best for speed may depend a lot on whether the architecture has hardware support for binary128 arithmetic; if it has such support, it's more likely an implementation based on binary128 floating-point operations is efficient; if it doesn't, direct use of integer arithmetic, without lots of intermediate packing / unpacking into the binary128 format, is likely to be more efficient. See the discussion starting at <https://sourceware.org/pipermail/libc-alpha/2020-June/thread.html#115229> for more on this - glibc is a better place for working on most optimized function implementations than GCC. See also <https://core-math.gitlabpages.inria.fr/> - those functions are aiming to be correctly rounding, which is *not* a goal for most glibc libm functions, but are still quite likely to be faster than the existing non-optimized functions in glibc. fma is a particularly tricky case because it *is* required to be correctly rounding, in all rounding modes, and correct rounding implies correct exceptions, *and* correct exceptions for fma includes getting right the architecture-specific choice of whether tininess is detected before or after rounding. Correct exceptions for sqrt are simpler, but to be correct for glibc it still needs to avoid spurious "inexact" exceptions - for example, from the use of double in intermediate computations in your version (see the optimized feholdexcept / fesetenv operations used in glibc for cases where exceptions from intermediate computations are to be discarded). For functions that aren't required to be correctly rounding, the glibc manual discusses the accuracy goals (including on exceptions, e.g. avoiding spurious "underflow" exceptions from intermediate computations for results where the rounded result returned is not consistent with rounding a tiny, inexact value).