https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114107
--- Comment #15 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Richard Biener <rgue...@gcc.gnu.org>: https://gcc.gnu.org/g:1fe55a1794863b5ad9eeca5062782834716016b2 commit r15-1238-g1fe55a1794863b5ad9eeca5062782834716016b2 Author: Richard Biener <rguent...@suse.de> Date: Fri Jun 7 11:29:05 2024 +0200 tree-optimization/114107 - avoid peeling for gaps in more cases The following refactors the code to detect necessary peeling for gaps, in particular the PR103116 case when there is no gap but the group size is smaller than the vector size. The testcase in PR114107 shows we fail to SLP for (int i=0; i<n; i++) for (int k=0; k<4; k++) data[4*i+k] *= factor[i]; because peeling one scalar iteration isn't enough to cover a gap of 3 elements of factor[i]. But the code detecting this is placed after the logic that detects cases we handle properly already as we'd code generate { factor[i], 0., 0., 0. } for V4DFmode vectorization already. In fact the check to detect when peeling a single iteration isn't enough seems improperly guarded as it should apply to all cases. I'm not sure we correctly handle VMAT_CONTIGUOUS_REVERSE but I checked that VMAT_STRIDED_SLP and VMAT_ELEMENTWISE correctly avoid touching excess elements. With this change we can use SLP for the above testcase and the PR103116 testcases no longer require an epilogue on x86-64. It might be different on other targets so I made those testcases runtime FAIL only instead of relying on dump scanning there's currently no easy way to properly constrain. PR tree-optimization/114107 PR tree-optimization/110445 * tree-vect-stmts.cc (get_group_load_store_type): Refactor contiguous access case. Make sure peeling for gap constraints are always tested and consistently relax when we know we can avoid touching excess elements during code generation. But rewrite the check poly-int aware. * gcc.dg/vect/pr114107.c: New testcase. * gcc.dg/vect/pr103116-1.c: Adjust. * gcc.dg/vect/pr103116-2.c: Likewise.