[RFC PATCH 0/5] vect: Force peeling for alignment to handle more early break loops

Alex Coplan Mon, 28 Oct 2024 07:54:23 -0700

This patch series allows us to vectorize more loops with early exits by
forcing peeling for alignment to make sure that we're guaranteed to be
able to safely read an entire vector iteration without crossing a page
boundary.


The motivation is to vectorize search loops such as std::find.  This
shows up in (e.g.) xalancbmk from SPEC CPU 2017.

For a single pair of runs of SPEC CPU 2017 on Neoverse V1 with LTO, I
see notable improvements in xalancbmk (3.2%) and imagick (4.8%).  parest
shows a regression of 1.9%.  I see the following geomean improvements:

+-----------+---------+
| benchmark | geomean |
+-----------+---------+
| SPECint   | -0.17%  |
| SPECfp    | -0.13%  |
| overall   | -0.15%  |
+-----------+---------+

The series is structured as follows:
 - 1/5 adds the new functionality.
 - 2/5 fixes a latent wrong code (and missed optimization) bug.
 - 3/5 fixes a latent dominator ICE (exposed by 1/5).
 - 4/5 fixes another latent wrong code bug exposed by the changes.
 - 5/5 fixes a costing issue that caused us to miss vectorization with larger
   element sizes.

The patch series survives an O3-bootstrap on aarch64-linux-gnu.  There
are currently some testsuite regressions, in the following files:

gcc.dg/unroll-6.c
gcc.dg/tree-ssa/cunroll-13.c
gcc.dg/tree-ssa/cunroll-14.c
gcc.dg/tree-ssa/predcom-8.c
gcc.dg/vect/bb-slp-pr65935.c
gcc.dg/vect/vect-104.c
gcc.dg/vect/vect-early-break_108-pr113588.c
gcc.dg/vect/vect-early-break_109-pr113588.c
gcc.dg/vect/vect-early-break_110-pr113467.c
gcc.dg/vect/vect-early-break_3.c
gcc.dg/vect/vect-early-break_65.c
gcc.dg/vect/vect-early-break_8.c
gfortran.dg/vect/vect-5.f90
gfortran.dg/vect/vect-8.f90

mostly, these seem to be testims (e.g. we now vectorize 2 loops, test
expected to see only 1).  I think the main "real" (non-testism) issues
are latent early break profile update bugs exposed by the series.  These
are failures like:

+FAIL: gcc.dg/tree-ssa/cunroll-14.c scan-tree-dump-not cunroll "Invalid sum"
+FAIL: gcc.dg/tree-ssa/predcom-8.c scan-tree-dump-not pcom "Invalid sum"

I have some WIP patches to address these, but I didn't want to block
this series on getting review by waiting until those patches are
finished.

E.g. one of the main profile issues I noticed is that multiplicative
scaling of BB frequencies (as in scale_loop_profile) doesn't work in the
case of multiple exits (provided we want to hold the counts along exit
edges fixed).  I have a patch that addresses this, but it probably makes
most sense to post it once all the profile issues are fixed.

I would appreciate any feedback on the patches at this stage.

Thanks,
Alex

Alex Coplan (4):
  vect: Force alignment peeling to vectorize more early break loops
  vect: Don't guard scalar epilogue for inverted loops
  vect: Ensure we add vector skip guard even when versioning for aliasing
  vect: Also cost gconds for scalar

Tamar Christina (1):
  vect: Fix dominators when adding a guard to skip the vector loop

 .../g++.dg/vect/vect-early-break_6.cc         | 25 +++++
 .../gcc.dg/vect/vect-early-break_130.c        | 91 +++++++++++++++++++
 gcc/tree-vect-data-refs.cc                    | 77 +++++++++++++---
 gcc/tree-vect-loop-manip.cc                   | 36 ++++++--
 gcc/tree-vect-loop.cc                         |  4 +-
 gcc/tree-vectorizer.h                         |  5 +
 6 files changed, 215 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/vect-early-break_6.cc
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break_130.c

[RFC PATCH 0/5] vect: Force peeling for alignment to handle more early break loops

Reply via email to