On Fri, Jul 15, 2016 at 3:55 PM, Bill Schmidt <wschm...@linux.vnet.ibm.com> wrote: > Hi, > > This patch is a follow-up to Richard's patch of > https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00584.html. The cost of a > vec_construct (initialization of an N-way vector by N scalars) is too low, > which can cause too-aggressive vectorization in particular for N=8 or > higher. Richard changed the default cost to N-1, which is generally > sensible. For powerpc I am going with a slightly higher cost of N, which > will keep us from being less conservative than the previous values when N=2. > > The whole cost model for powerpc needs more work (in particular we need > to distinguish among processor models), but that's beyond the scope of > this patch. One thing that I've called out in the comments is that a > vec_construct can have wildly different costs depending on the scalar > elements. If they are all the same small constant, then we only need > a single splat-immediate instruction; but for V4SF the cost is potentially > higher because of the need to do converts. For the splat case, we might > want to teach the vectorizer in general to estimate the cost as just > a vector_stmt rather than a vec_construct, but that requires some target > knowledge of which constants can be duplicated with a splat-immediate. > > In any case, the purpose of this patch is simply to avoid vectorizing > things we shouldn't when we've undercounted the cost of a vec_construct. > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no > regressions (hence the vectorization decisions in the test suite have > not changed). Is this ok for trunk?
Note that most of the vectorizer testsuite is running with -fno-vect-cost-model, only the costmodel tests are running with the cost model enabled. Richard. > Thanks, > Bill > > > 2016-07-15 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): > Improve vec_construct estimate. > > > Index: gcc/config/rs6000/rs6000.c > =================================================================== > --- gcc/config/rs6000/rs6000.c (revision 238312) > +++ gcc/config/rs6000/rs6000.c (working copy) > @@ -5138,7 +5138,6 @@ rs6000_builtin_vectorization_cost (enum vect_cost_ > tree vectype, int misalign) > { > unsigned elements; > - tree elem_type; > > switch (type_of_cost) > { > @@ -5245,16 +5244,16 @@ rs6000_builtin_vectorization_cost (enum vect_cost_ > return 2; > > case vec_construct: > - elements = TYPE_VECTOR_SUBPARTS (vectype); > - elem_type = TREE_TYPE (vectype); > - /* 32-bit vectors loaded into registers are stored as double > - precision, so we need n/2 converts in addition to the usual > - n/2 merges to construct a vector of short floats from them. */ > - if (SCALAR_FLOAT_TYPE_P (elem_type) > - && TYPE_PRECISION (elem_type) == 32) > - return elements + 1; > - else > - return elements / 2 + 1; > + /* This is a rough approximation assuming non-constant elements > + constructed into a vector via element insertion. FIXME: > + vec_construct is not granular enough for uniformly good > + decisions. If the initialization is a splat, this is > + cheaper than we estimate. If we want to form four SF > + values into a vector, it's more expensive (we need to > + copy the four elements into two vector registers, > + perform two conversions to single precision, and merge > + the two result vectors). Improve this someday. */ > + return TYPE_VECTOR_SUBPARTS (vectype); > > default: > gcc_unreachable (); >