https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794
--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> --- On Fri, 28 May 2021, linkw at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794 > > --- Comment #2 from Kewen Lin <linkw at gcc dot gnu.org> --- > (In reply to Richard Biener from comment #1) > > Thanks for the comments! > > > There's predictive commoning which can do similar transforms and runs after > > vectorization. It might be it doesn't handle these "simple" cases or that > > loop dependence info is not up to the task there. > > > > pcom does fix this problem, but it's enabled by default at -O3. Could it be > considered to be run at O2? Or enabled at O2 at some conditions such as: only > for one loop which skips loop carried optimization and isn't vectorized > further? I think pcom should be enabled when vectorization is due to the interaction with PRE. It could be tamed down (it can do peeling/unrolling which is why it is -O3) based on the vectorizer cost model active if only implicitely enabled ... Things will get a bit messy I guess. > > Another option is to avoid the PRE guard with the (very) cheap cost model > > at the expense of not vectorizing affected loops. > > > > OK, I will benchmark this to see its impact. For the particular loops in > 554.roms_r, they can be vectorized at cheap cost model, this bmk got improved > at cheap cost model on both Power8 and Power9 (a bit though). So I will just > test the impact on very cheap cost model. So another thing to benchmark would be to enable pcom but make sure /* Determine the unroll factor, and if the loop should be unrolled, ensure that its number of iterations is divisible by the factor. */ unroll_factor = determine_unroll_factor (chains); scev_reset (); unroll = (unroll_factor > 1 && can_unroll_loop_p (loop, unroll_factor, &desc)); is false for the cheap and very-cheap cost models unless flag_predictive_commoning is active. It's probably also a good idea to investigate whether the update_ssa calls in pcom can be delayed to until after all transforms have been done.