This patch series allows us to vectorize more loops with early exits by forcing peeling for alignment to make sure that we're guaranteed to be able to safely read an entire vector iteration without crossing a page boundary.
The motivation is to vectorize search loops such as std::find. This shows up in (e.g.) xalancbmk from SPEC CPU 2017. For a single pair of runs of SPEC CPU 2017 on Neoverse V1 with LTO, I see notable improvements in xalancbmk (3.2%) and imagick (4.8%). parest shows a regression of 1.9%. I see the following geomean improvements: +-----------+---------+ | benchmark | geomean | +-----------+---------+ | SPECint | -0.17% | | SPECfp | -0.13% | | overall | -0.15% | +-----------+---------+ The series is structured as follows: - 1/5 adds the new functionality. - 2/5 fixes a latent wrong code (and missed optimization) bug. - 3/5 fixes a latent dominator ICE (exposed by 1/5). - 4/5 fixes another latent wrong code bug exposed by the changes. - 5/5 fixes a costing issue that caused us to miss vectorization with larger element sizes. The patch series survives an O3-bootstrap on aarch64-linux-gnu. There are currently some testsuite regressions, in the following files: gcc.dg/unroll-6.c gcc.dg/tree-ssa/cunroll-13.c gcc.dg/tree-ssa/cunroll-14.c gcc.dg/tree-ssa/predcom-8.c gcc.dg/vect/bb-slp-pr65935.c gcc.dg/vect/vect-104.c gcc.dg/vect/vect-early-break_108-pr113588.c gcc.dg/vect/vect-early-break_109-pr113588.c gcc.dg/vect/vect-early-break_110-pr113467.c gcc.dg/vect/vect-early-break_3.c gcc.dg/vect/vect-early-break_65.c gcc.dg/vect/vect-early-break_8.c gfortran.dg/vect/vect-5.f90 gfortran.dg/vect/vect-8.f90 mostly, these seem to be testims (e.g. we now vectorize 2 loops, test expected to see only 1). I think the main "real" (non-testism) issues are latent early break profile update bugs exposed by the series. These are failures like: +FAIL: gcc.dg/tree-ssa/cunroll-14.c scan-tree-dump-not cunroll "Invalid sum" +FAIL: gcc.dg/tree-ssa/predcom-8.c scan-tree-dump-not pcom "Invalid sum" I have some WIP patches to address these, but I didn't want to block this series on getting review by waiting until those patches are finished. E.g. one of the main profile issues I noticed is that multiplicative scaling of BB frequencies (as in scale_loop_profile) doesn't work in the case of multiple exits (provided we want to hold the counts along exit edges fixed). I have a patch that addresses this, but it probably makes most sense to post it once all the profile issues are fixed. I would appreciate any feedback on the patches at this stage. Thanks, Alex Alex Coplan (4): vect: Force alignment peeling to vectorize more early break loops vect: Don't guard scalar epilogue for inverted loops vect: Ensure we add vector skip guard even when versioning for aliasing vect: Also cost gconds for scalar Tamar Christina (1): vect: Fix dominators when adding a guard to skip the vector loop .../g++.dg/vect/vect-early-break_6.cc | 25 +++++ .../gcc.dg/vect/vect-early-break_130.c | 91 +++++++++++++++++++ gcc/tree-vect-data-refs.cc | 77 +++++++++++++--- gcc/tree-vect-loop-manip.cc | 36 ++++++-- gcc/tree-vect-loop.cc | 4 +- gcc/tree-vectorizer.h | 5 + 6 files changed, 215 insertions(+), 23 deletions(-) create mode 100644 gcc/testsuite/g++.dg/vect/vect-early-break_6.cc create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break_130.c