https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56766
--- Comment #26 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to rguent...@suse.de from comment #25) > > Richi, please note that tree-vectorizer doesn't vectorize bar_v2df, at least > > there is no VEC_PERM_EXPR in the .optimized dump: > > > > void bar_v2df (double * __restrict__ p, double * __restrict q) > > { > > p[0] = p[0] - q[0]; > > p[1] = p[1] + q[1]; > > } > > That's because of (unless you specify -fno-vect-cost-model): > > t.c:3:11: note: Cost model analysis: > Vector inside of basic block cost: 9 > Vector prologue cost: 0 > Vector epilogue cost: 0 > Scalar cost of basic block: 8 > t.c:3:11: note: not vectorized: vectorization is not profitable. > > so it computes a too high vectorized cost. This is because the > target unspecific code handling this is estimating the cost as > needing both the add and the subtract and the shuffle. The > target vectorizer cost hook could adjust this to a more sensible > value if addsubpd is available. Thanks, -fno-vec-cost-model did the trick here. > > Another question w.r.t. to foo_* testcases that use __builtin_shuffle: > > > > v4sf foo_v4sf (v4sf x, v4sf y) > > { > > v4sf tem0 = x - y; > > v4sf tem1 = x + y; > > return __builtin_shuffle (tem0, tem1, (v4si) { 0, 5, 2, 7 }); > > } > > > > is functionaly equivalent to: > > > > v4sf foo_v4sf (v4sf x, v4sf y) > > { > > v4sf tem0 = x + y; > > v4sf tem1 = x - y; > > return __builtin_shuffle (tem0, tem1, (v4si) { 4, 1, 6, 3 }); > > } > > > > But the later construct isn't simplified. Should we declare canonical form > > as > > the one with "element 0 from the first operand"? > > That one is interesting. I'd say we'd need to define a total ordering > here. Note that a canonical form is only accepted when the target accepts > it (see the VEC_PERM_EXPR case in fold-const.c). > > So, if we can write a function compare_perm_for_canonical (unsigned char > *sel1, unsigned char *sel2, unsigned n) we could use that to determine > if swapping arg0 and arg1 makes the permute mask more canonical. > > So yes, we should have a canonical form for the above and yes, we > could say that we order after element0 and if that is equal after > element1, and so on. I will open a new PR for this, I think that the proposed patch fixes this one.