Hi, Richi. >> Does RVV allow MASK_STORE from >> intrinsics? No, RVV didn't use any internal_fn in intrinsics.
>>with disabling complete unrolling -fdisable-tree-cunrolli both >>loops should get vectorized, hopefully not iterating(?) Then >>we end up with the overlapping CSE opportunity, no? Maybe >>it needs support for fixed size vectors and using VL vectors >>just for the epilog. I tried on ARM: https://godbolt.org/z/cTET8f7W9 It seems that it can't reproduce CSE opportunity. I am not sure whether I am doing something wrong. Besides, RVV currently can't have VLS modes (V4SI) since LTO issue. I am trying to find a CSE opportunity on VL vectors. Thanks. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-06-27 15:47 To: juzhe.zh...@rivai.ai CC: gcc-patches; richard.sandiford; pan2.li Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote: > Hi, Richi. > > When I try vector_cst_encoded_nelts (mask), the testcase I append is 2 but > actual actual nunits is 8 in that case, > then I failed to walk all elements analysis. > > >> The most simple thing would be to make this all conditional > >> to constant TYPE_VECTOR_SUBPARTS. Which is also why I was > >> asking for VL vector testcases. > Ok, I understand your point, but RVV doesn't use LEN_MASK_STORE in > intrinsics. I am not sure how to reproduce VL vectors > in C code. The original motivation was from unrolled vector code for the conditional masking case. Does RVV allow MASK_STORE from intrinsics? Otherwise how about for (i = 0; i < 8; ++i) a[i] = 42; for (i = 2; i < 6; ++i) b[i] = a[i]; with disabling complete unrolling -fdisable-tree-cunrolli both loops should get vectorized, hopefully not iterating(?) Then we end up with the overlapping CSE opportunity, no? Maybe it needs support for fixed size vectors and using VL vectors just for the epilog. Richard. > Thanks. > > > juzhe.zh...@rivai.ai > > From: Richard Biener > Date: 2023-06-27 15:33 > To: Ju-Zhe Zhong > CC: gcc-patches; richard.sandiford; pan2.li > Subject: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE > On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote: > > > From: Ju-Zhe Zhong <juzhe.zh...@rivai.ai> > > > > Hi, Richi. > > > > I tried to understand your last email and to refactor the do-while loop > > using VECTOR_CST_NELTS. > > > > This patch works fine for LEN_MASK_STORE and compiler can CSE redundant > > store. > > I have appended testcase in this patch to test VN for LEN_MASK_STORE. > > > > I am not sure whether I am on the same page with you. > > > > Feel free to correct me, Thanks. > > > > gcc/ChangeLog: > > > > * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and > > fix LEN_STORE > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c: New > > test. > > > > --- > > .../rvv/autovec/partial/len_maskstore_vn-1.c | 30 +++++++++++++++++++ > > gcc/tree-ssa-sccvn.cc | 24 +++++++++++---- > > 2 files changed, 49 insertions(+), 5 deletions(-) > > create mode 100644 > > gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c > > > > diff --git > > a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c > > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c > > new file mode 100644 > > index 00000000000..0b2d03693dc > > --- /dev/null > > +++ > > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c > > @@ -0,0 +1,30 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-march=rv32gcv_zvl256b -mabi=ilp32d --param > > riscv-autovec-preference=fixed-vlmax -O3 -fdump-tree-fre5" } */ > > + > > +void __attribute__((noinline,noclone)) > > +foo (int *out, int *res) > > +{ > > + int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 }; > > + int i; > > + for (i = 0; i < 16; ++i) > > + { > > + if (mask[i]) > > + out[i] = i; > > + } > > + int o0 = out[0]; > > + int o7 = out[7]; > > + int o14 = out[14]; > > + int o15 = out[15]; > > + res[0] = o0; > > + res[2] = o7; > > + res[4] = o14; > > + res[6] = o15; > > +} > > + > > +/* Vectorization produces .LEN_MASK_STORE, unrolling will unroll the two > > + vector iterations. FRE5 after that should be able to CSE > > + out[7] and out[15], but leave out[0] and out[14] alone. */ > > +/* { dg-final { scan-tree-dump " = o0_\[0-9\]+;" "fre5" } } */ > > +/* { dg-final { scan-tree-dump " = 7;" "fre5" } } */ > > +/* { dg-final { scan-tree-dump " = o14_\[0-9\]+;" "fre5" } } */ > > +/* { dg-final { scan-tree-dump " = 15;" "fre5" } } */ > > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc > > index 11061a374a2..242d82d6274 100644 > > --- a/gcc/tree-ssa-sccvn.cc > > +++ b/gcc/tree-ssa-sccvn.cc > > @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void > > *data_, > > if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias)) > > return (void *)-1; > > break; > > + case IFN_LEN_MASK_STORE: > > + len = gimple_call_arg (call, 2); > > + bias = gimple_call_arg (call, 5); > > + if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias)) > > + return (void *)-1; > > + mask = gimple_call_arg (call, internal_fn_mask_index (fn)); > > + mask = vn_valueize (mask); > > + if (TREE_CODE (mask) != VECTOR_CST) > > + return (void *)-1; > > + break; > > default: > > return (void *)-1; > > } > > @@ -3344,11 +3354,17 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void > > *data_, > > tree vectype = TREE_TYPE (def_rhs); > > unsigned HOST_WIDE_INT elsz > > = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype))); > > + /* Set initial len value is the UINT_MAX, so mask_idx < actual_len > > + is always true for MASK_STORE. */ > > + unsigned actual_len = UINT_MAX; > > + if (len) > > + actual_len = tree_to_uhwi (len) + tree_to_shwi (bias); > > + unsigned nunits > > + = MIN (actual_len, VECTOR_CST_NELTS (mask).coeffs[0]); > > No, that's not correct and what I meant. There's > vector_cst_encoded_nelts (mask), but for example for an > all-ones mask that would be 1. You'd then also not access > VECTOR_CST_ELT but VECTOR_CST_ENCODED_ELT. What I'm not sure > is how to recover the implicit walk over all elements for the > purpose of computing start/length and how generally this will > work for variable-length vectors where enumerating "all" > elements touched is required for correctness. > > The most simple thing would be to make this all conditional > to constant TYPE_VECTOR_SUBPARTS. Which is also why I was > asking for VL vector testcases. > > Richard. > > > if (mask) > > { > > HOST_WIDE_INT start = 0, length = 0; > > - unsigned mask_idx = 0; > > - do > > + for (unsigned mask_idx = 0; mask_idx < nunits; mask_idx++) > > { > > if (integer_zerop (VECTOR_CST_ELT (mask, mask_idx))) > > { > > @@ -3371,9 +3387,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void > > *data_, > > } > > else > > length += elsz; > > - mask_idx++; > > } > > - while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype))); > > if (length != 0) > > { > > pd.rhs_off = start; > > @@ -3389,7 +3403,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void > > *data_, > > { > > pd.offset = offset2i; > > pd.size = (tree_to_uhwi (len) > > - + -tree_to_shwi (bias)) * BITS_PER_UNIT; > > + + tree_to_shwi (bias)) * BITS_PER_UNIT; > > if (BYTES_BIG_ENDIAN) > > pd.rhs_off = pd.size - tree_to_uhwi (TYPE_SIZE (vectype)); > > else > > > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)