Richard Biener <rguent...@suse.de> writes: >> Am 06.09.2024 um 16:05 schrieb Robin Dapp <rdapp....@gmail.com>: >> >> Hi, >> >> PR112694 shows that we try to create sub-vectors of single-element >> vectors because can_duplicate_and_interleave_p returns true. > > Can we avoid querying the function? CCing Richard who should know more about > this. > > Richard > >> The problem resurfaced in PR116611. >> >> This patch makes can_duplicate_and_interleave_p return false >> if count / nvectors > 0 and removes the corresponding check in the riscv >> backend. >> >> This partially gets rid of the FAIL in slp-19a.c. At least when built >> with cost model we don't have LOAD_LANES anymore. Without cost model, >> as in the test suite, we choose a different path and still end up with >> LOAD_LANES.
Could you walk me through the failure in more detail? It sounds like can_duplicate_and_interleave_p eventually gets to the point of subdividing the original elements, instead of either combining consecutive elements (the best case), or leaving them as-is (the expected fallback for SVE). But it sounds like those attempts fail in this case, but an attempt to subdivide the elements succeeds. Is that right? And if so, why does that happen? Thanks, Richard >> >> Bootstrapped and regtested on x86 and power10, regtested on >> rv64gcv_zvfh_zvbb. Still waiting for the aarch64 results. >> >> Regards >> Robin >> >> gcc/ChangeLog: >> >> PR target/112694 >> PR target/116611. >> >> * config/riscv/riscv-v.cc (expand_vec_perm_const): Remove early >> return. >> * tree-vect-slp.cc (can_duplicate_and_interleave_p): Return >> false when we cannot create sub-elements. >> --- >> gcc/config/riscv/riscv-v.cc | 9 --------- >> gcc/tree-vect-slp.cc | 4 ++++ >> 2 files changed, 4 insertions(+), 9 deletions(-) >> >> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc >> index 9b6c3a21e2d..5c5ed63d22e 100644 >> --- a/gcc/config/riscv/riscv-v.cc >> +++ b/gcc/config/riscv/riscv-v.cc >> @@ -3709,15 +3709,6 @@ expand_vec_perm_const (machine_mode vmode, >> machine_mode op_mode, rtx target, >> mask to do the iteration loop control. Just disable it directly. */ >> if (GET_MODE_CLASS (vmode) == MODE_VECTOR_BOOL) >> return false; >> - /* FIXME: Explicitly disable VLA interleave SLP vectorization when we >> - may encounter ICE for poly size (1, 1) vectors in loop vectorizer. >> - Ideally, middle-end loop vectorizer should be able to disable it >> - itself, We can remove the codes here when middle-end code is able >> - to disable VLA SLP vectorization for poly size (1, 1) VF. */ >> - if (!BYTES_PER_RISCV_VECTOR.is_constant () >> - && maybe_lt (BYTES_PER_RISCV_VECTOR * TARGET_MAX_LMUL, >> - poly_int64 (16, 16))) >> - return false; >> >> struct expand_vec_perm_d d; >> >> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc >> index 3d2973698e2..17b59870c69 100644 >> --- a/gcc/tree-vect-slp.cc >> +++ b/gcc/tree-vect-slp.cc >> @@ -434,6 +434,10 @@ can_duplicate_and_interleave_p (vec_info *vinfo, >> unsigned int count, >> unsigned int nvectors = 1; >> for (;;) >> { >> + /* We need to be able to to fuse COUNT / NVECTORS elements together, >> + so no point in continuing if there are none. */ >> + if (nvectors > count) >> + return false; >> scalar_int_mode int_mode; >> poly_int64 elt_bits = elt_bytes * BITS_PER_UNIT; >> if (int_mode_for_size (elt_bits, 1).exists (&int_mode)) >> -- >> 2.46.0 >>