Thanks! This was committed to trunk last week as r247671. As we discussed offline, I've also backported to GCC 7 (r248010) and GCC 6 (r248011).
Bill > On May 5, 2017, at 10:30 AM, Segher Boessenkool <seg...@kernel.crashing.org> > wrote: > > Hi Bill, > > On Wed, May 03, 2017 at 02:43:09PM -0500, Bill Schmidt wrote: >> We recently became aware of some poor code generation as a result of >> unprofitable (for POWER) loop vectorization. When a loop is simply copying >> data with 64-bit loads and stores, vectorizing with 128-bit loads and stores >> generally does not provide any benefit on modern POWER processors. >> Furthermore, if there is a requirement to version the loop for aliasing, >> alignment, etc., the cost of the versioning test is almost certainly a >> performance loss for such loops. The user code example included such a copy >> loop, executed only a few times on average, within an outer loop that was >> executed many times on average, causing a tremendous slowdown. >> >> This patch very specifically targets these kinds of loops and no others, >> and artificially inflates the vectorization cost to ensure vectorization >> does not appear profitable. This is done within the target model cost >> hooks to avoid affecting other targets. A new test case is included that >> demonstrates the refusal to vectorize. >> >> We've done SPEC performance testing to verify that the patch does not >> degrade such workloads. Results were all in the noise range. The >> customer code performance loss was verified to have been reversed. >> >> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. >> Is this ok for trunk? > >> 2017-05-03 Bill Schmidt <wschm...@linux.vnet.ibm.com> >> >> * config/rs6000/rs6000.c (rs6000_vect_nonmem): New static var. >> (rs6000_init_cost): Initialize rs6000_vect_nonmem. >> (rs6000_add_stmt_cost): Update rs6000_vect_nonmem. >> (rs6000_finish_cost): Avoid vectorizing simple copy loops with >> VF=2 that require versioning. >> >> [gcc/testsuite] >> >> 2017-05-03 Bill Schmidt <wschm...@linux.vnet.ibm.com> >> >> * gcc.target/powerpc/veresioned-copy-loop.c: New file. > >> --- gcc/config/rs6000/rs6000.c (revision 247560) >> +++ gcc/config/rs6000/rs6000.c (working copy) >> @@ -5873,6 +5873,8 @@ rs6000_density_test (rs6000_cost_data *data) >> >> /* Implement targetm.vectorize.init_cost. */ >> >> +static bool rs6000_vect_nonmem; > > Please put a comment on this, saying what it is for. > >> + /* Check whether we're doing something other than just a copy loop. >> + Not all such loops may be profitably vectorized; see >> + rs6000_finish_cost. */ >> + if ((where == vect_body >> + && (kind == vector_stmt || kind == vec_to_scalar || kind == vec_perm >> + || kind == vec_promote_demote || kind == vec_construct >> + || kind == scalar_to_vec)) >> + || (where != vect_body >> + && (kind == vec_to_scalar || kind == vec_perm >> + || kind == vec_promote_demote || kind == vec_construct >> + || kind == scalar_to_vec))) >> + rs6000_vect_nonmem = true; > > Perhaps > > + if ((kind == vec_to_scalar || kind == vec_perm > + || kind == vec_promote_demote || kind == vec_construct > + || kind == scalar_to_vec) > + || (where == vect_body && kind == vector_stmt)) >> + rs6000_vect_nonmem = true; > > if you agree that is clearer. > > Okay for trunk with the comment added, and the condition either or not > simplified. Thanks, > > > Segher >