Hi, >> Did you enable FMA? I'd expect 1 - x*x to be accurate with FMA, so the >> relative error >> should be much better. If there is no FMA, 2*(1-fabs(x)) - (1-fabs(x))^2 >> should be >> more accurate when abs(x)>0.5 and still much faster. > >No, but I will check how to enable it if FMA is available. > I did a minor test with your formula and the precision improved a lot.
> But now I am puzzled about how did you come up with that formula :-). > I am able to proof equality, but how did you know it was going to be > more precise? Basically when x is close to 1, x the top N bits in the mantissa will be ones. Then x*x has one bits in the top 2*N bits in the mantissa. Ie. we lose N bits of useful information in the multiply - problematic when N gets close to the number of mantissa bits. In contrast FMA computes the fully accurate result due to cancellation of the top 2*N one-bits in the subtract. If we can use (1-x) instead of x in the evaluation, we avoid losing accuracy in the multiply when x is close to 1. Then it's basic algebra to find an equivalent formula that can produce 1-x^2 using 1-x. For example (1+x)*(1-x) will work fine too (using 1+x loses 1 low bit of x). Note that these alternative evaluations lose accuracy close to 0 in exactly the same way, so if no FMA is available you'd need to select between the 2 cases. Wilco