Hello! This patch builds on recent patch by Michael (that implemented fine-grained control on -mrecip option) and with -ffast-math emits reciprocal sequences with additional NR step for vectorized SFmode division and vectorized sqrtf(x).
2011-10-20 Uros Bizjak <ubiz...@gmail.com> * config/i386/i386.h (RECIP_MASK_DEFAULT): New define. * config/i386/i386.op (recip_mask): Initialize with RECIP_MASK_DEFAULT. * doc/invoke.texi (mrecip): Document that GCC implements vectorized single float division and vectorized sqrtf(x) with reciprocal sequence with additional Newton-Raphson step with -ffast-math. The patch was tested on x86_64-pc-linux-gnu, but I would like Joseph to check if I didn't mess something with options handling. The effect of the patch is 7% faster gas_dyn from polyhedron testsuite on corei7-avx. Uros.
Index: config/i386/i386.h =================================================================== --- config/i386/i386.h (revision 180176) +++ config/i386/i386.h (working copy) @@ -2322,6 +2322,7 @@ #define RECIP_MASK_VEC_SQRT 0x08 #define RECIP_MASK_ALL (RECIP_MASK_DIV | RECIP_MASK_SQRT \ | RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT) +#define RECIP_MASK_DEFAULT (RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT) #define TARGET_RECIP_DIV ((recip_mask & RECIP_MASK_DIV) != 0) #define TARGET_RECIP_SQRT ((recip_mask & RECIP_MASK_SQRT) != 0) Index: config/i386/i386.opt =================================================================== --- config/i386/i386.opt (revision 180176) +++ config/i386/i386.opt (working copy) @@ -32,7 +32,7 @@ HOST_WIDE_INT ix86_isa_flags_explicit TargetVariable -int recip_mask +int recip_mask = RECIP_MASK_DEFAULT Variable int recip_mask_explicit Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 180176) +++ doc/invoke.texi (working copy) @@ -12927,6 +12927,11 @@ already with @option{-ffast-math} (or the above option combination), and doesn't need @option{-mrecip}. +Also note that GCC emits the above sequence with additional Newton-Raphson step +for vectorized single float division and vectorized sqrtf(x) already with +@option{-ffast-math} (or the above option combination), and doesn't need +@option{-mrecip}. + @item -mrecip=@var{opt} @opindex mrecip=opt This option allows to control which reciprocal estimate instructions