On Wed, Jan 09, 2019 at 12:03:45PM +0100, Jakub Jelinek wrote:
> > The above is a typical example. So, to give a complete source 'vec_sqrt.cc':
> > 
> > #include <cmath>
> > 
> > extern float data [ 32768 ] ;
> > 
> > extern void vf1()
> > {
> >   #pragma vectorize enable
> >   for ( int i = 0 ; i < 32768 ; i++ )
> >     data [ i ] = std::sqrt ( data [ i ] ) ;
> > }
> > 
> > This has a large trip count, the loop is trivial. It's an ideal candidate
> > for autovectorization. When I compile this source, using
> > 
> > g++ -O3 -mavx2 -S -o sqrt.s sqrt_gcc.cc
> 
> Generally you want -Ofast or -ffast-math or at least some suboptions of that
> if you want to vectorize floating point loops, because vectorization in many
> cases changes where FPU exceptions would be generated, can affect precision
> by reordering the ops etc. In the above case it is just that glibc
> declares the vector math functions for #ifdef __FAST_MATH__ only, as they
> have worse precision.

Actually, the last sentence was just a wrong guess in this case, for sqrt no
glibc libcall is needed, that is for trigonometric and the like, all you
need for the above to vectorize from -ffast-math is -fno-math-errno, tell
the compiler you don't need errno set if you call sqrt on negative etc.
With  -fopt-info-vec-missed the compiler would tell you:
/tmp/1.c:5:3: note: not vectorized: control flow in loop.
/tmp/1.c:5:3: note: bad loop form.
and you could look at the dumps to see that there is
  _2 = .SQRT (_1);
  if (_1 u>= 0.0)
    goto <bb 8>; [99.95%]
  else
    goto <bb 4>; [0.05%]
...
  <bb 4> [local count: 531495]:
  __builtin_sqrt (_1);
which is the idiom to do sqrt inline using instruction, but in the unlikely
case when the argument is negative, also call the library function so that
it handles the errno setting.

        Jakub

Reply via email to