The following makes sure to take into account prologue peeling when trying to narrow down the maximum number of iterations computed for the epilogue of a vectorized epilogue.
Bootstrap & regtest running on x86_64-unknown-linux-gnu. I did not verify this solves the original aarch64 testcase yet but it looks like a simpler fix and explains why I don't see the issue on the 11 branch which does otherwise the same transforms. Richard. 2022-04-27 Richard Biener <rguent...@suse.de> PR tree-optimization/105219 * tree-vect-loop.cc (vect_transform_loop): Disable special code narrowing the vectorized epilogue epilogue max iterations when peeling for alignment was in effect. * gcc.dg/vect/pr105219.c: New testcase. --- gcc/testsuite/gcc.dg/vect/pr105219.c | 29 ++++++++++++++++++++++++++++ gcc/tree-vect-loop.cc | 2 +- 2 files changed, 30 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/vect/pr105219.c diff --git a/gcc/testsuite/gcc.dg/vect/pr105219.c b/gcc/testsuite/gcc.dg/vect/pr105219.c new file mode 100644 index 00000000000..0cb7ae2f4d6 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr105219.c @@ -0,0 +1,29 @@ +/* { dg-do run } */ +/* { dg-additional-options "-O3" } */ +/* { dg-additional-options "-mtune=intel" { target x86_64-*-* i?86-*-* } } */ + +#include "tree-vect.h" + +int data[128]; + +void __attribute((noipa)) +foo (int *data, int n) +{ + for (int i = 0; i < n; ++i) + data[i] = i; +} + +int main() +{ + check_vect (); + for (int start = 0; start < 16; ++start) + for (int n = 1; n < 3*16; ++n) + { + __builtin_memset (data, 0, sizeof (data)); + foo (&data[start], n); + for (int j = 0; j < n; ++j) + if (data[start + j] != j) + __builtin_abort (); + } + return 0; +} diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index d7bc34636bd..217abab814b 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -9977,7 +9977,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) lowest_vf) - 1 : wi::udiv_floor (loop->nb_iterations_upper_bound + bias_for_lowest, lowest_vf) - 1); - if (main_vinfo) + if (main_vinfo && !main_vinfo->peeling_for_alignment) { unsigned int bound; poly_uint64 main_iters -- 2.34.1