Hi,

This patch is a follow-up to Richard's patch of
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00584.html.  The cost of a
vec_construct (initialization of an N-way vector by N scalars) is too low,
which can cause too-aggressive vectorization in particular for N=8 or
higher.  Richard changed the default cost to N-1, which is generally
sensible.  For powerpc I am going with a slightly higher cost of N, which
will keep us from being less conservative than the previous values when N=2.

The whole cost model for powerpc needs more work (in particular we need
to distinguish among processor models), but that's beyond the scope of
this patch.  One thing that I've called out in the comments is that a
vec_construct can have wildly different costs depending on the scalar
elements.  If they are all the same small constant, then we only need
a single splat-immediate instruction; but for V4SF the cost is potentially
higher because of the need to do converts.  For the splat case, we might
want to teach the vectorizer in general to estimate the cost as just
a vector_stmt rather than a vec_construct, but that requires some target
knowledge of which constants can be duplicated with a splat-immediate.

In any case, the purpose of this patch is simply to avoid vectorizing
things we shouldn't when we've undercounted the cost of a vec_construct.
Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions (hence the vectorization decisions in the test suite have
not changed).  Is this ok for trunk?

Thanks,
Bill


2016-07-15  Bill Schmidt  <wschm...@linux.vnet.ibm.com>

        * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
        Improve vec_construct estimate.


Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c  (revision 238312)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -5138,7 +5138,6 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
                                    tree vectype, int misalign)
 {
   unsigned elements;
-  tree elem_type;
 
   switch (type_of_cost)
     {
@@ -5245,16 +5244,16 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
         return 2;
 
       case vec_construct:
-       elements = TYPE_VECTOR_SUBPARTS (vectype);
-       elem_type = TREE_TYPE (vectype);
-       /* 32-bit vectors loaded into registers are stored as double
-          precision, so we need n/2 converts in addition to the usual
-          n/2 merges to construct a vector of short floats from them.  */
-       if (SCALAR_FLOAT_TYPE_P (elem_type)
-           && TYPE_PRECISION (elem_type) == 32)
-         return elements + 1;
-       else
-         return elements / 2 + 1;
+       /* This is a rough approximation assuming non-constant elements
+          constructed into a vector via element insertion.  FIXME:
+          vec_construct is not granular enough for uniformly good
+          decisions.  If the initialization is a splat, this is
+          cheaper than we estimate.  If we want to form four SF
+          values into a vector, it's more expensive (we need to
+          copy the four elements into two vector registers,
+          perform two conversions to single precision, and merge
+          the two result vectors).  Improve this someday.  */
+       return TYPE_VECTOR_SUBPARTS (vectype);
 
       default:
         gcc_unreachable ();

Reply via email to