On Mon, 27 Jan 2020, Richard Sandiford wrote: > For the 2s failures in the PR, we have a V4SF VEC_PERM_EXPR in > which the first two elements are duplicates of one element and > the other two are don't-care: > > v4sf_b = VEC_PERM_EXPR <v4sf_a, v4sf_a, { 1, 1, ?, ? }>; > > The heuristic was to extend this with a blend: > > v4sf_b = VEC_PERM_EXPR <v4sf_a, v4sf_a, { 1, 1, 2, 3 }>; > > but it seems better to extend a partial duplicate to a full duplicate: > > v4sf_b = VEC_PERM_EXPR <v4sf_a, v4sf_a, { 1, 1, 1, 1 }>; > > Obviously this is still just a heuristic though. > > I wondered whether to restrict this to two elements or more > but couldn't find any examples in which it made a difference. > Either way should be fine for the purposes of fixing this PR. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. Richard mentioned > in the PR that he had a different fix in mind, but since I'd tested > this overnight, I thought I might as well post it anyway as a possible > belt-and-braces fix. OK to install?
OK. It's a sound heuristic - IIRC I played with some form of this (not sure if I tried to restrict it to all-same elts) and for some cases it was worse on x86. On x86 there's a splat but obviously not from a random vector lane so if one wants to use that it's an extract (move to lane zero) and then splat. Anyway, if we have particular cases that regress with this we want testsuite coverage. [not sure how difficult it is to cover all cases up to N elements with some scripting... looking over the assembly diff would be not fun in any case...] Thanks, Richard. > Richard > > > 2020-01-27 Richard Sandiford <richard.sandif...@arm.com> > > gcc/ > PR tree-optimization/92822 > * tree-ssa-forwprop.c (simplify_vector_constructor): When filling > out the don't-care elements of a vector whose significant elements > are duplicates, make the don't-care elements duplicates too. > --- > gcc/tree-ssa-forwprop.c | 22 ++++++++++++++++------ > 1 file changed, 16 insertions(+), 6 deletions(-) > > diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c > index d63e87c8a5b..5203891950a 100644 > --- a/gcc/tree-ssa-forwprop.c > +++ b/gcc/tree-ssa-forwprop.c > @@ -2455,16 +2455,26 @@ simplify_vector_constructor (gimple_stmt_iterator > *gsi) > it and its source indexes to make the permutation supported. > For now it mimics a blend. */ > vec_perm_builder sel (refnelts, refnelts, 1); > + bool all_same_p = true; > for (i = 0; i < elts.length (); ++i) > - sel.quick_push (elts[i].second + elts[i].first * refnelts); > + { > + sel.quick_push (elts[i].second + elts[i].first * refnelts); > + all_same_p &= known_eq (sel[i], sel[0]); > + } > /* And fill the tail with "something". It's really don't care, > and ideally we'd allow VEC_PERM to have a smaller destination > - vector. As heuristic try to preserve a uniform orig[0] which > - facilitates later pattern-matching VEC_PERM_EXPR to a > - BIT_INSERT_EXPR. */ > + vector. As a heuristic: > + > + (a) if what we have so far duplicates a single element, make the > + tail do the same > + > + (b) otherwise preserve a uniform orig[0]. This facilitates > + later pattern-matching of VEC_PERM_EXPR to a BIT_INSERT_EXPR. */ > for (; i < refnelts; ++i) > - sel.quick_push ((elts[0].second == 0 && elts[0].first == 0 > - ? 0 : refnelts) + i); > + sel.quick_push (all_same_p > + ? sel[0] > + : (elts[0].second == 0 && elts[0].first == 0 > + ? 0 : refnelts) + i); > vec_perm_indices indices (sel, orig[1] ? 2 : 1, refnelts); > if (!can_vec_perm_const_p (TYPE_MODE (perm_type), indices)) > return false;