On Apr 27, 2007, at 06:12, Janne Blomqvist wrote:
I agree it can be an issue, but OTOH people who care about precision probably 1. avoid -ffast-math 2. use double precision (where these reciprocal instrs are not available). Intel calls it - no-prec-div, but it's enabled for the "-fast" catch-all option.

On a related note, our beloved competitors generally have some high level flag for combining all these fancy and potentially unsafe optimizations (e.g. -O4, -fast, -fastsse, -Ofast, etc.). For gcc, at least FP benchmarks seem to do generally well with something like "-O3 -funroll-loops -ftree-vectorize -ffast-math -march=native -mfpmath=sse", but it's quite a mouthful.

No, using only 12 bits of precision is just ridiculous and should
not be included in -ffast-math. You should always use a Newton-Rhapson
step after getting the 12-bit approximation. When done correctly
this doubles the precision and gets you just about the 24 bits of
precision needed for float. Reciprocal approximations are meant
to be used that way, and it's no accident the lookup provides
exactly half the bits needed. For double precision you just do
two more iterations, which is why there is no need for double
precision variants of these instructions.

The cost for the extra step is small, and you get good results.
There are many variations possible, and using fused-multiply add
it's even possible to get correctly rounded results at low cost.
I truly doubt that any of the compilers you mention use these
instructions without NR iteration to get required precision.

  -Geert

Reply via email to