https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117502
Bug ID: 117502 Summary: Fail to SLP gcc.target/aarch64/sve/pr95199.c Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- We fail to SLP the non-stride 1 version of gcc.target/aarch64/sve/pr95199.c where we apply stride versioning. Without SLP we can successfully use gathers while with SLP we end up with pr95199.c:8:21: missed: Not using elementwise accesses due to variable vectorization factor. pr95199.c:10:8: missed: not vectorized: relevant stmt not supported: _4 = *_3; pr95199.c:8:21: note: unsupported SLP instance starting from: *_3 = _10; pr95199.c:8:21: missed: unsupported SLP instances this is easier visible when adding -fno-version-loops-for-strides We fail to get pr95199.c:8:21: note: ==> examining statement: _4 = *_3; pr95199.c:8:21: note: using gather/scatter for strided/grouped access, scale = pr95199.c:8:21: note: vect_model_load_cost: inside_cost = 2, prologue_cost = 0 . The reason is that we're using VMAT_STRIDED_SLP and consider gather only for VMAT_ELEMENTWISE, failing to realize that the caller (get_load_store_type) will reject both in case of a variable length access. There's also practically no difference between VMAT_ELEMENTWISE and VMAT_STRIDED_SLP for single element accesses.