> -----Original Message----- > I agree - for example powerpc has -mrecip= to control which instructions > to use (float/double rsqrt or inverse) and -mrecip-precision to > specify whether further iteration is done or not. > > x86 has similar but does always perform newton raphson iteration, > documenting 2 ulp instead of 0.5 ulp precision. > > Your suggested huge reduction in precision isn't usually acceptable > and should be always explicitely enabled.
There isn't a problem with *this* patch (although we do have existing accuracy issues thanks to previous documents lacking the information). The "inaccurate" instructions are single-precision only, and therefore acceptable with -ffast-math. Kwok intends to provide vectorized library calls for the double-precision and -fno-fast-math cases. In general I want to avoid adding extra arch-specific options; partly because approximately no one will use them, and partly because the amdgcn compiler is almost always hidden behind an x86_64 compiler. Andrew