https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56118
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Marc Glisse from comment #4) > #include <x86intrin.h> > __m128d f(){ > __m128d r; > r[0]=1; > r[1]=2; > return r; > } > > Currently, SLP vectorizes it with -fvect-cost-model=unlimited, but not by > default because: > > Vector inside of basic block cost: 1 > Vector prologue cost: 1 > Vector epilogue cost: 0 > Scalar cost of basic block: 2 > r.c:4:9: note: not vectorized: vectorization is not profitable. > > And if r is initialized to {3,4} as in the initial testcase, we don't > vectorize either: > > r.c:3:17: note: not vectorized: no vectype for stmt: # .MEM_2 = VDEF > <.MEM_1(D)> > rD.15637 = { 3.0e+0, 4.0e+0 }; > scalar_type: __m128dD.4386 > r.c:3:17: note: not vectorized: not enough data-refs in basic block. If we fix that (trivial) we run into t.c:3:15: note: === vect_slp_analyze_data_ref_dependences === t.c:3:15: note: can't determine dependence between r and BIT_FIELD_REF <r, 64, 0> because we end up with a write-write dependence we can't analyze. Of course in the end we do not need all dependences but only those for the code motion we are going to perform.