I meant to mention that I regstrapped this on both POWER8 and POWER9. Bill
> On May 11, 2017, at 10:59 AM, Bill Schmidt <wschm...@linux.vnet.ibm.com> > wrote: > > Hi, > > PR80695 identifies a case (similar to several others we've seen) where SLP > vectorization is too aggressive about vectorizing stores. The problem is > that we undervalue the cost of a vec_construct operation. vec_construct > is the vectorizer's representation for building a vector from scalar > elements. When we construct an integer vector type from its constituent > parts, it requires a direct move from two GPRs (one instruction on P9, > two direct moves and a merge on P8). The high cost of this is not > reflected in the current cost calculation, which only counts the cost > of combining the elements using N-1 inserts. This patch provides a higher > estimate that is closer to reality. Note that all cost estimation for > vectorization is a bit rough, so this should be viewed as a heuristic. > > The patch treats all integer vectors separately from the default case. > There is already special handling for V4SFmode, so this leaves only > V2DFmode in the default case. It was previously established heuristically > that a cost factor of 2 was appropriate for V2DFmode, so that is left > unchanged here; but since V2DFmode is the only default, we can simplify > the calculation to just return 2. > > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. > Is this ok for trunk? > > Thanks, > Bill > > > [gcc] > > 2017-05-11 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > PR target/80695 > * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): > Account for direct move costs for vec_construct of integer > vectors. > > [gcc/testsuite] > > 2017-05-11 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > PR target/80695 > * gcc.target/powerpc/pr80695-p8.c: New file. > * gcc.target/powerpc/pr80695-p9.c: New file. > > > Index: gcc/config/rs6000/rs6000.c > =================================================================== > --- gcc/config/rs6000/rs6000.c (revision 247809) > +++ gcc/config/rs6000/rs6000.c (working copy) > @@ -5849,8 +5849,20 @@ rs6000_builtin_vectorization_cost (enum vect_cost_ > if (SCALAR_FLOAT_TYPE_P (elem_type) > && TYPE_PRECISION (elem_type) == 32) > return 5; > + /* On POWER9, integer vector types are built up in GPRs and then > + use a direct move (2 cycles). For POWER8 this is even worse, > + as we need two direct moves and a merge, and the direct moves > + are five cycles. */ > + else if (INTEGRAL_TYPE_P (elem_type)) > + { > + if (TARGET_P9_VECTOR) > + return TYPE_VECTOR_SUBPARTS (vectype) - 1 + 2; > + else > + return TYPE_VECTOR_SUBPARTS (vectype) - 1 + 11; > + } > else > - return max (2, TYPE_VECTOR_SUBPARTS (vectype) - 1); > + /* V2DFmode doesn't need a direct move. */ > + return 2; > > default: > gcc_unreachable (); > Index: gcc/testsuite/gcc.target/powerpc/pr80695-p8.c > =================================================================== > --- gcc/testsuite/gcc.target/powerpc/pr80695-p8.c (nonexistent) > +++ gcc/testsuite/gcc.target/powerpc/pr80695-p8.c (working copy) > @@ -0,0 +1,18 @@ > +/* { dg-do compile { target { powerpc*-*-* } } } */ > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { > "-mcpu=power8" } } */ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-options "-mcpu=power8 -O3 -fdump-tree-slp-details" } */ > + > +/* PR80695: Verify cost model for vec_construct on POWER8. */ > + > +long a[10] __attribute__((aligned(16))); > + > +void foo (long i, long j, long k, long l) > +{ > + a[6] = i; > + a[7] = j; > + a[8] = k; > + a[9] = l; > +} > + > +/* { dg-final { scan-tree-dump-times "vectorization is not profitable" 1 > "slp2" } } */ > Index: gcc/testsuite/gcc.target/powerpc/pr80695-p9.c > =================================================================== > --- gcc/testsuite/gcc.target/powerpc/pr80695-p9.c (nonexistent) > +++ gcc/testsuite/gcc.target/powerpc/pr80695-p9.c (working copy) > @@ -0,0 +1,18 @@ > +/* { dg-do compile { target { powerpc*-*-* } } } */ > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { > "-mcpu=power9" } } */ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-options "-mcpu=power9 -O3 -fdump-tree-slp-details" } */ > + > +/* PR80695: Verify cost model for vec_construct on POWER9. */ > + > +long a[10] __attribute__((aligned(16))); > + > +void foo (long i, long j, long k, long l) > +{ > + a[6] = i; > + a[7] = j; > + a[8] = k; > + a[9] = l; > +} > + > +/* { dg-final { scan-tree-dump-times "vectorization is not profitable" 1 > "slp2" } } */ >