On Wed, Jan 09, 2019 at 12:03:45PM +0100, Jakub Jelinek wrote: > > The above is a typical example. So, to give a complete source 'vec_sqrt.cc': > > > > #include <cmath> > > > > extern float data [ 32768 ] ; > > > > extern void vf1() > > { > > #pragma vectorize enable > > for ( int i = 0 ; i < 32768 ; i++ ) > > data [ i ] = std::sqrt ( data [ i ] ) ; > > } > > > > This has a large trip count, the loop is trivial. It's an ideal candidate > > for autovectorization. When I compile this source, using > > > > g++ -O3 -mavx2 -S -o sqrt.s sqrt_gcc.cc > > Generally you want -Ofast or -ffast-math or at least some suboptions of that > if you want to vectorize floating point loops, because vectorization in many > cases changes where FPU exceptions would be generated, can affect precision > by reordering the ops etc. In the above case it is just that glibc > declares the vector math functions for #ifdef __FAST_MATH__ only, as they > have worse precision.
Actually, the last sentence was just a wrong guess in this case, for sqrt no glibc libcall is needed, that is for trigonometric and the like, all you need for the above to vectorize from -ffast-math is -fno-math-errno, tell the compiler you don't need errno set if you call sqrt on negative etc. With -fopt-info-vec-missed the compiler would tell you: /tmp/1.c:5:3: note: not vectorized: control flow in loop. /tmp/1.c:5:3: note: bad loop form. and you could look at the dumps to see that there is _2 = .SQRT (_1); if (_1 u>= 0.0) goto <bb 8>; [99.95%] else goto <bb 4>; [0.05%] ... <bb 4> [local count: 531495]: __builtin_sqrt (_1); which is the idiom to do sqrt inline using instruction, but in the unlikely case when the argument is negative, also call the library function so that it handles the errno setting. Jakub