On Thu, Oct 20, 2011 at 4:45 PM, Joseph S. Myers <jos...@codesourcery.com> wrote:
>> The patch was tested on x86_64-pc-linux-gnu, but I would like Joseph >> to check if I didn't mess something with options handling. > > I have no comments on the option handling in this patch. > >> +for vectorized single float division and vectorized sqrtf(x) already with > > @code{sqrtf (@var{x})} Thanks - fixed, with a similar fix in the previous paragraph. I also found a PR that deals with vectorized reciprocal, so I referred to the PR in the ChangeLog entry: 2011-10-20 Uros Bizjak <ubiz...@gmail.com> PR target/47989 * config/i386/i386.h (RECIP_MASK_DEFAULT): New define. * config/i386/i386.op (recip_mask): Initialize with RECIP_MASK_DEFAULT. * doc/invoke.texi (ix86 Options, -mrecip): Document that GCC implements vectorized single float division and vectorized sqrtf(x) with reciprocal sequence with additional Newton-Raphson step with -ffast-math. Attached is the patch that was committed to mainline SVN. Encouraged by Michael's results, let's see what automated benchmark testers will show. Uros.
Index: config/i386/i386.h =================================================================== --- config/i386/i386.h (revision 180255) +++ config/i386/i386.h (working copy) @@ -2322,6 +2322,7 @@ #define RECIP_MASK_VEC_SQRT 0x08 #define RECIP_MASK_ALL (RECIP_MASK_DIV | RECIP_MASK_SQRT \ | RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT) +#define RECIP_MASK_DEFAULT (RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT) #define TARGET_RECIP_DIV ((recip_mask & RECIP_MASK_DIV) != 0) #define TARGET_RECIP_SQRT ((recip_mask & RECIP_MASK_SQRT) != 0) Index: config/i386/i386.opt =================================================================== --- config/i386/i386.opt (revision 180255) +++ config/i386/i386.opt (working copy) @@ -32,7 +32,7 @@ HOST_WIDE_INT ix86_isa_flags_explicit TargetVariable -int recip_mask +int recip_mask = RECIP_MASK_DEFAULT Variable int recip_mask_explicit Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 180255) +++ doc/invoke.texi (working copy) @@ -12922,7 +12922,12 @@ of the non-reciprocal instruction, the precision of the sequence can be decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994). -Note that GCC implements 1.0f/sqrtf(x) in terms of RSQRTSS (or RSQRTPS) +Note that GCC implements @code{1.0f/sqrtf(@var{x})} in terms of RSQRTSS +(or RSQRTPS) already with @option{-ffast-math} (or the above option +combination), and doesn't need @option{-mrecip}. + +Also note that GCC emits the above sequence with additional Newton-Raphson step +for vectorized single float division and vectorized @code{sqrtf(@var{x})} already with @option{-ffast-math} (or the above option combination), and doesn't need @option{-mrecip}.