Sorry for the long delay on getting back to this.  I took a look at the
suggested test cases with the cost model available, and did some SPEC
testing to validate the model.  I found that it is still important to
model the 4xfloat case separately to account for conversion from 64-bit
to 32-bit in our internal representation; correcting the calculation
there actually results in no net change, but the commentary is now
better.  The default cost of N-1 that Richard added in his patch applies
well to POWER also, except for V2DI and V2DF modes (N=2) where this
tends to undercount the cost and encourage unprofitable SLP
vectorization in some cases.  Establishing a minimum cost of 2 avoids
test suite regressions and produces acceptable SPEC results.  (So rather
than using N instead of N-1 as in the previous version of this patch,
I'm using N-1 with a floor of 2.)

I looked through gcc.dg/vect/slp-4[35].c with the cost model enabled,
and the results are sensible with these changes.  SPEC results were all
in the noise range.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Ok for trunk?

Thanks,
Bill


2016-08-10  Bill Schmidt  <wschm...@linux.vnet.ibm.com>

        * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
        Correct costs for vec_construct.


Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c  (revision 239310)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -5266,16 +5266,20 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
         return 2;
 
       case vec_construct:
-       elements = TYPE_VECTOR_SUBPARTS (vectype);
+       /* This is a rough approximation assuming non-constant elements
+          constructed into a vector via element insertion.  FIXME:
+          vec_construct is not granular enough for uniformly good
+          decisions.  If the initialization is a splat, this is
+          cheaper than we estimate.  Improve this someday.  */
        elem_type = TREE_TYPE (vectype);
        /* 32-bit vectors loaded into registers are stored as double
-          precision, so we need n/2 converts in addition to the usual
-          n/2 merges to construct a vector of short floats from them.  */
+          precision, so we need 2 permutes, 2 converts, and 1 merge
+          to construct a vector of short floats from them.  */
        if (SCALAR_FLOAT_TYPE_P (elem_type)
            && TYPE_PRECISION (elem_type) == 32)
-         return elements + 1;
+         return 5;
        else
-         return elements / 2 + 1;
+         return max (2, TYPE_VECTOR_SUBPARTS (vectype) - 1);
 
       default:
         gcc_unreachable ();




On Mon, 2016-07-18 at 14:29 +0200, Richard Biener wrote:
> On Mon, Jul 18, 2016 at 1:56 PM, Segher Boessenkool
> <seg...@kernel.crashing.org> wrote:
> > Hi Bill,
> >
> > On Fri, Jul 15, 2016 at 08:55:08AM -0500, Bill Schmidt wrote:
> >> This patch is a follow-up to Richard's patch of
> >> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00584.html.  The cost of a
> >> vec_construct (initialization of an N-way vector by N scalars) is too low,
> >> which can cause too-aggressive vectorization in particular for N=8 or
> >> higher.  Richard changed the default cost to N-1, which is generally
> >> sensible.  For powerpc I am going with a slightly higher cost of N, which
> >> will keep us from being less conservative than the previous values when 
> >> N=2.
> >
> >> In any case, the purpose of this patch is simply to avoid vectorizing
> >> things we shouldn't when we've undercounted the cost of a vec_construct.
> >> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> >> regressions (hence the vectorization decisions in the test suite have
> >> not changed).  Is this ok for trunk?
> >
> > Do you also have a testcase where it does matter?  It would be good to
> > add that, then.  Or is it fixing a regression?
> >
> > I know nothing about the cost model, so someone else will have to review,
> > or I can just say "okay" ;-)
> 
> You can maybe look at gcc.dg/vect/slp-4[35].c (and run it with the cost model
> enabled).
> 
> Richard.
> 
> >
> > Segher
> 


Reply via email to