Richard Henderson <richard.hender...@linaro.org> writes:
> Rename to parts$N_sqrt. > Reimplement float128_sqrt with FloatParts128. > > Reimplement with the inverse sqrt newton-raphson algorithm from musl. > This is significantly faster than even the berkeley sqrt n-r algorithm, > because it does not use division instructions, only multiplication. > > Ordinarily, changing algorithms at the same time as migrating code is > a bad idea, but this is the only way I found that didn't break one of > the routines at the same time. I can't pretend to follow the details of the method as well as I could the original but that's why we have tests so if they are happy I'm happy: Tested-by: Alex Bennée <alex.ben...@linaro.org> <snip> > + > + if (N == 64) { > + /* float64 or smaller */ > + > + r32 = ((uint64_t)r32 * u32) >> 31; > + /* |r*sqrt(m) - 1| < 0x1.7Bp-16 */ > + > + s32 = ((uint64_t)m32 * r32) >> 32; > + d32 = ((uint64_t)s32 * r32) >> 32; > + u32 = three32 - d32; > + > + if (fmt->frac_size <= 23) { > + /* float32 or smaller */ > + > + s32 = ((uint64_t)s32 * u32) >> 32; /* 3.29 */ > + s32 = (s32 - 1) >> 6; /* 9.23 */ > + /* s < sqrt(m) < s + 0x1.08p-23 */ > + > + /* compute nearest rounded result to 2.23 bits */ > + uint32_t d0 = (m32 << 16) - s32 * s32; > + uint32_t d1 = s32 - d0; > + uint32_t d2 = d1 + s32 + 1; > + s32 += d1 >> 31; > + a->frac_hi = (uint64_t)s32 << (64 - 25); > + > + /* increment or decrement for inexact */ > + if (d2 != 0) { > + a->frac_hi += ((int32_t)(d1 ^ d2) < 0 ? -1 : 1); > + } > + goto done; > + } > + > + /* float64 */ > + > + r64 = (uint64_t)r32 * u32 * 2; > + /* |r*sqrt(m) - 1| < 0x1.37-p29; convert to 64-bit arithmetic */ > + mul64To128(m64, r64, &s64, &discard); > + mul64To128(s64, r64, &d64, &discard); > + u64 = three64 - d64; > + > + mul64To128(s64, u64, &s64, &discard); /* 3.61 */ > + s64 = (s64 - 2) >> 9; /* 12.52 */ > + > + /* Compute nearest rounded result */ > + uint64_t d0 = (m64 << 42) - s64 * s64; > + uint64_t d1 = s64 - d0; > + uint64_t d2 = d1 + s64 + 1; > + s64 += d1 >> 63; > + a->frac_hi = s64 << (64 - 54); > + > + /* increment or decrement for inexact */ > + if (d2 != 0) { > + a->frac_hi += ((int64_t)(d1 ^ d2) < 0 ? -1 : 1); > + } > + goto done; > + } I usually take more convincing about gotos but I can't see a way of doing it more neatly so have a: Reviewed-by: Alex Bennée <alex.ben...@linaro.org> as well :-) -- Alex Bennée