On Fri, Oct 19, 2018 at 01:39:01PM +0000, Wilco Dijkstra wrote: > >> Did you enable FMA? I'd expect 1 - x*x to be accurate with FMA, so the > >> relative error > >> should be much better. If there is no FMA, 2*(1-fabs(x)) - (1-fabs(x))^2 > >> should be > >> more accurate when abs(x)>0.5 and still much faster. > > > >No, but I will check how to enable it if FMA is available. > > I did a minor test with your formula and the precision improved a lot. > > > But now I am puzzled about how did you come up with that formula :-). > > I am able to proof equality, but how did you know it was going to be > > more precise? > > Basically when x is close to 1, x the top N bits in the mantissa will be ones. > Then x*x has one bits in the top 2*N bits in the mantissa. Ie. we lose N bits > of > useful information in the multiply - problematic when N gets close to the > number > of mantissa bits. In contrast FMA computes the fully accurate result due to > cancellation of the top 2*N one-bits in the subtract. > > If we can use (1-x) instead of x in the evaluation, we avoid losing accuracy > in the > multiply when x is close to 1. Then it's basic algebra to find an equivalent > formula > that can produce 1-x^2 using 1-x. For example (1+x)*(1-x) will work fine too > (using 1+x loses 1 low bit of x). > > Note that these alternative evaluations lose accuracy close to 0 in exactly > the > same way, so if no FMA is available you'd need to select between the 2 cases.
At this point this seems like something that shouldn't be done inline anymore, so either we don't do this optimization at all, because the errors are far bigger than what is acceptable even for -ffast-math, or we have a library function that does the sinh (tanh (x)) and cosh (tanh (x)) computations somewhere (libm, libgcc, ...) that handles all the cornercases. Jakub