Hi, PR112694 shows that we try to create sub-vectors of single-element vectors because can_duplicate_and_interleave_p returns true. The problem resurfaced in PR116611.
This patch makes can_duplicate_and_interleave_p return false if count / nvectors > 0 and removes the corresponding check in the riscv backend. This partially gets rid of the FAIL in slp-19a.c. At least when built with cost model we don't have LOAD_LANES anymore. Without cost model, as in the test suite, we choose a different path and still end up with LOAD_LANES. Bootstrapped and regtested on x86 and power10, regtested on rv64gcv_zvfh_zvbb. Still waiting for the aarch64 results. Regards Robin gcc/ChangeLog: PR target/112694 PR target/116611. * config/riscv/riscv-v.cc (expand_vec_perm_const): Remove early return. * tree-vect-slp.cc (can_duplicate_and_interleave_p): Return false when we cannot create sub-elements. --- gcc/config/riscv/riscv-v.cc | 9 --------- gcc/tree-vect-slp.cc | 4 ++++ 2 files changed, 4 insertions(+), 9 deletions(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 9b6c3a21e2d..5c5ed63d22e 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -3709,15 +3709,6 @@ expand_vec_perm_const (machine_mode vmode, machine_mode op_mode, rtx target, mask to do the iteration loop control. Just disable it directly. */ if (GET_MODE_CLASS (vmode) == MODE_VECTOR_BOOL) return false; - /* FIXME: Explicitly disable VLA interleave SLP vectorization when we - may encounter ICE for poly size (1, 1) vectors in loop vectorizer. - Ideally, middle-end loop vectorizer should be able to disable it - itself, We can remove the codes here when middle-end code is able - to disable VLA SLP vectorization for poly size (1, 1) VF. */ - if (!BYTES_PER_RISCV_VECTOR.is_constant () - && maybe_lt (BYTES_PER_RISCV_VECTOR * TARGET_MAX_LMUL, - poly_int64 (16, 16))) - return false; struct expand_vec_perm_d d; diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 3d2973698e2..17b59870c69 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -434,6 +434,10 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned int count, unsigned int nvectors = 1; for (;;) { + /* We need to be able to to fuse COUNT / NVECTORS elements together, + so no point in continuing if there are none. */ + if (nvectors > count) + return false; scalar_int_mode int_mode; poly_int64 elt_bits = elt_bytes * BITS_PER_UNIT; if (int_mode_for_size (elt_bits, 1).exists (&int_mode)) -- 2.46.0