On Wed, 2019-01-09 at 11:10 -0500, David Malcolm wrote: > On Wed, 2019-01-09 at 09:56 +0000, Jonathan Wakely wrote: > > On Wed, 9 Jan 2019 at 09:50, Andrew Haley wrote: > > > I don't agree. Sometimes vectorization is critical. It would be > > > nice > > > to have a warning which would fire if vectorization failed. That > > > would > > > surely help the OP. > > > > Dave Malcolm has been working on something like that: > > https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01749.html > > Yes: this code is in trunk for gcc 9, but it doesn't help much for > the > case given elsewhere in this thread: > > #include <cmath> > > extern float data [ 32768 ] ; > > extern void vf1() > { > #pragma vectorize enable > for ( int i = 0 ; i < 32768 ; i++ ) > data [ i ] = std::sqrt ( data [ i ] ) ; > } > > Compiling on this x86_64 box with -fopt-info-vec-missed shows the > rather cryptic: > > g++ -c /tmp/sqrt-test.cc -O3 -mavx2 -fopt-info-vec-missed > /tmp/sqrt-test.cc:8:24: missed: couldn't vectorize loop > /tmp/sqrt-test.cc:8:24: missed: not vectorized: control flow in loop. > /home/david/coding/gcc-python/gcc-svn-trunk/install- > dogfood/include/c++/9.0.0/cmath:464:27: missed: statement clobbers > memory: __builtin_sqrtf (_1); > > and with -fopt-info-vec-all-internals shows: > > g++ -c /tmp/sqrt-test.cc -O3 -mavx2 -fopt-info-vec-all-internals > > Analyzing loop at /tmp/sqrt-test.cc:8 > /tmp/sqrt-test.cc:8:24: note: === analyze_loop_nest === > /tmp/sqrt-test.cc:8:24: note: === vect_analyze_loop_form === > /tmp/sqrt-test.cc:8:24: missed: not vectorized: control flow in > loop. > /tmp/sqrt-test.cc:8:24: missed: bad loop form. > /tmp/sqrt-test.cc:8:24: missed: couldn't vectorize loop > /tmp/sqrt-test.cc:8:24: missed: not vectorized: control flow in loop. > /tmp/sqrt-test.cc:5:13: note: vectorized 0 loops in function. > /home/david/coding/gcc-python/gcc-svn-trunk/install- > dogfood/include/c++/9.0.0/cmath:464:27: note: === > vect_slp_analyze_bb === > /home/david/coding/gcc-python/gcc-svn-trunk/install- > dogfood/include/c++/9.0.0/cmath:464:27: note: === > vect_analyze_data_refs === > /home/david/coding/gcc-python/gcc-svn-trunk/install- > dogfood/include/c++/9.0.0/cmath:464:27: note: got vectype for stmt: > _1 = data[i_12]; > vector(8) float > /home/david/coding/gcc-python/gcc-svn-trunk/install-
> dogfood/include/c++/9.0.0/cmath:464:27: missed: not vectorized: not > enough data-refs in basic block. > /home/david/coding/gcc-python/gcc-svn-trunk/install- > dogfood/include/c++/9.0.0/cmath:464:27: missed: statement clobbers > memory: __builtin_sqrtf (_1); > /tmp/sqrt-test.cc:8:24: note: === vect_slp_analyze_bb === > /tmp/sqrt-test.cc:8:24: note: === vect_analyze_data_refs === > /tmp/sqrt-test.cc:8:24: note: got vectype for stmt: data[i_12] = > _7; > vector(8) float > /tmp/sqrt-test.cc:8:24: missed: not vectorized: not enough data-refs > in basic block. > /tmp/sqrt-test.cc:10:1: note: === vect_slp_analyze_bb === > /tmp/sqrt-test.cc:10:1: note: === vect_analyze_data_refs === > /tmp/sqrt-test.cc:10:1: missed: not vectorized: not enough data-refs > in basic block. > > I had to turn on -fdump-tree-all to try to figure out what that > "control flow in loop" was; it seems to be a guard against the input > to > value being negative: > > <bb 3> [local count: 1063004407]: > # i_12 = PHI <0(2), i_6(7)> > # ivtmp_10 = PHI <32768(2), ivtmp_2(7)> > # DEBUG i => i_12 > # DEBUG BEGIN_STMT > _1 = data[i_12]; > # DEBUG __x => _1 > # DEBUG BEGIN_STMT > _7 = .SQRT (_1); > if (_1 u>= 0.0) > goto <bb 8>; [99.95%] > else > goto <bb 4>; [0.05%] > > <bb 8> [local count: 1062472912]: > goto <bb 5>; [100.00%] > > <bb 4> [local count: 531495]: > __builtin_sqrtf (_1); > > I'm not sure where that control flow came from: it isn't in > sqrt-test.cc.104t.stdarg > but is in > sqrt-test.cc.105t.cdce > so I think it's coming from the argument-range code in cdce. > > Arguably the location on the statement is wrong: it's on the loop > header, when it presumably should be on the std::sqrt call. > > Shall I file a bugzilla about this? ...and -fno-tree-builtin-call-dce eliminates the control flow, but it still doesn't vectorize the loop; on godbolt.org with: -O3 -mavx2 -fopt-info-vec-all -fno-tree-builtin-call-dce gcc trunk x86_64 gives: <source>:8:24: missed: couldn't vectorize loop /opt/compiler-explorer/gcc-trunk-20190109/include/c++/9.0.0/cmath:464:27: missed: statement clobbers memory: _7 = __builtin_sqrtf (_1); <source>:5:13: note: vectorized 0 loops in function. /opt/compiler-explorer/gcc-trunk-20190109/include/c++/9.0.0/cmath:464:27: missed: statement clobbers memory: _7 = __builtin_sqrtf (_1); Compiler returned: 0 ...so presumably it doesn't know how to vectorize that builtin call. Dave