https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77776
--- Comment #15 from Matthias Kretz (Vir) <mkretz at gcc dot gnu.org> --- Your implementation still needs to solve: 1. Loss of precision because of division & subsequent scaling by max. Users comparing std::hypot(x, y, z) against a simple std::sqrt(x * x + y * y + z * z) might wonder why they want to use std::hypot if it's less precise. 2. Relatively high cost (in latency and throughput) because of the three divisions. You could replace it with scale = 1/max; x *= scale; ... But that can reduce precision even further. 3. Summation of the x, y, and z squares isn't associative if you care about precision. A high quality implementation needs to add the two lowest values first. Here's a precise but inefficient implementation: (https://compiler-explorer.com/z/ocGPnsYE3) template <typename T> [[gnu::optimize("-fno-unsafe-math-optimizations")]] T hypot3(T x, T y, T z) { x = std::abs(x); y = std::abs(y); z = std::abs(z); if (std::isinf(x) || std::isinf(y) || std::isinf(z)) return std::__infinity_v<T>; else if (std::isnan(x) || std::isnan(y) || std::isnan(z)) return std::__quiet_NaN_v<T>; else if (x == y && y == z) return x * std::sqrt(T(3)); else if (z == 0 && y == 0) return x; else if (x == 0 && z == 0) return y; else if (x == 0 && y == 0) return z; else { T hi = std::max(std::max(x, y), z); T lo0 = std::min(std::max(x, y), z); T lo1 = std::min(x, y); int e = 0; hi = std::frexp(hi, &e); lo0 = std::ldexp(lo0, -e); lo1 = std::ldexp(lo1, -e); T lo = lo0 * lo0 + lo1 * lo1; return std::ldexp(std::sqrt(hi * hi + lo), e); } } AFAIK https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/include/experimental/bits/simd_math.h;h=06e7b4496f9917f886f66fbd7629700dd17e55f9;hb=HEAD#l1168 is a precise and efficient implementation. It also avoids division altogether unless an input is subnormal.