On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote: > Hi, Richi. > After several tries, I found a case that is "close to" have CSE opportunity > in RVV for VL vectors: > > void __attribute__((noinline,noclone)) > foo (uint16_t *out, uint16_t *res) > { > int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1 }; > int i; > for (i = 0; i < 8; ++i) > { > if (mask[i]) > out[i] = 33; > } > uint16_t o0 = out[0]; > uint16_t o7 = out[3]; > uint16_t o14 = out[6]; > uint16_t o15 = out[7]; > res[0] = o0; > res[2] = o7; > res[4] = o14; > res[6] = o15; > } > > The Gimple IR: > _64 = .SELECT_VL (ivtmp_31, POLY_INT_CST [4, 4]); > vect__1.15_54 = .LEN_MASK_LOAD (vectp_mask.13_21, 32B, _64, { -1, ... }, 0); > mask__7.16_56 = vect__1.15_54 != { 0, ... }; > .LEN_MASK_STORE (vectp_out.17_34, 16B, _64, mask__7.16_56, { 33, ... }, 0); > > You can see the "len" is always variable produced by SELECT_VL so it failed > to have CSE opportunity. > And I tried in ARM SVE: https://godbolt.org/z/63a6WcT9o > It also fail to have CSE opportunity. > > It seems that it's difficult to have such CSE opportunity in VL vectors.
Ah. Nice example. This shows we fail to constant fold [vec_unpack_lo_expr] { -1, -1, ..., 0, 0, 0, ... } which means we fail to fold the .MASK_LOAD from 'mask' (that's not something we support, see my previous answer) and that means the .MASK_STORE mask doesn't end up constant. I understant that SVE cannot easily generate all constant [masks] so we probably shouldn't aggressively perform elimination on constant foldings but for actual constant folding of dependent stmts it might be nice to have the constant folded masks. I will open a bugreport with this testcase. Richard. > > juzhe.zh...@rivai.ai > > From: Richard Biener > Date: 2023-06-27 15:47 > To: juzhe.zh...@rivai.ai > CC: gcc-patches; richard.sandiford; pan2.li > Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE > On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote: > > > Hi, Richi. > > > > When I try vector_cst_encoded_nelts (mask), the testcase I append is 2 but > > actual actual nunits is 8 in that case, > > then I failed to walk all elements analysis. > > > > >> The most simple thing would be to make this all conditional > > >> to constant TYPE_VECTOR_SUBPARTS. Which is also why I was > > >> asking for VL vector testcases. > > Ok, I understand your point, but RVV doesn't use LEN_MASK_STORE in > > intrinsics. I am not sure how to reproduce VL vectors > > in C code. > > The original motivation was from unrolled vector code for the > conditional masking case. Does RVV allow MASK_STORE from > intrinsics? > > Otherwise how about > > for (i = 0; i < 8; ++i) > a[i] = 42; > for (i = 2; i < 6; ++i) > b[i] = a[i]; > > with disabling complete unrolling -fdisable-tree-cunrolli both > loops should get vectorized, hopefully not iterating(?) Then > we end up with the overlapping CSE opportunity, no? Maybe > it needs support for fixed size vectors and using VL vectors > just for the epilog. > > Richard. > > > > Thanks. > > > > > > juzhe.zh...@rivai.ai > > > > From: Richard Biener > > Date: 2023-06-27 15:33 > > To: Ju-Zhe Zhong > > CC: gcc-patches; richard.sandiford; pan2.li > > Subject: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE > > On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote: > > > > > From: Ju-Zhe Zhong <juzhe.zh...@rivai.ai> > > > > > > Hi, Richi. > > > > > > I tried to understand your last email and to refactor the do-while loop > > > using VECTOR_CST_NELTS. > > > > > > This patch works fine for LEN_MASK_STORE and compiler can CSE redundant > > > store. > > > I have appended testcase in this patch to test VN for LEN_MASK_STORE. > > > > > > I am not sure whether I am on the same page with you. > > > > > > Feel free to correct me, Thanks. > > > > > > gcc/ChangeLog: > > > > > > * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE > > > and fix LEN_STORE > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c: New > > > test. > > > > > > --- > > > .../rvv/autovec/partial/len_maskstore_vn-1.c | 30 +++++++++++++++++++ > > > gcc/tree-ssa-sccvn.cc | 24 +++++++++++---- > > > 2 files changed, 49 insertions(+), 5 deletions(-) > > > create mode 100644 > > > gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c > > > > > > diff --git > > > a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c > > > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c > > > new file mode 100644 > > > index 00000000000..0b2d03693dc > > > --- /dev/null > > > +++ > > > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c > > > @@ -0,0 +1,30 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-march=rv32gcv_zvl256b -mabi=ilp32d --param > > > riscv-autovec-preference=fixed-vlmax -O3 -fdump-tree-fre5" } */ > > > + > > > +void __attribute__((noinline,noclone)) > > > +foo (int *out, int *res) > > > +{ > > > + int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 }; > > > + int i; > > > + for (i = 0; i < 16; ++i) > > > + { > > > + if (mask[i]) > > > + out[i] = i; > > > + } > > > + int o0 = out[0]; > > > + int o7 = out[7]; > > > + int o14 = out[14]; > > > + int o15 = out[15]; > > > + res[0] = o0; > > > + res[2] = o7; > > > + res[4] = o14; > > > + res[6] = o15; > > > +} > > > + > > > +/* Vectorization produces .LEN_MASK_STORE, unrolling will unroll the two > > > + vector iterations. FRE5 after that should be able to CSE > > > + out[7] and out[15], but leave out[0] and out[14] alone. */ > > > +/* { dg-final { scan-tree-dump " = o0_\[0-9\]+;" "fre5" } } */ > > > +/* { dg-final { scan-tree-dump " = 7;" "fre5" } } */ > > > +/* { dg-final { scan-tree-dump " = o14_\[0-9\]+;" "fre5" } } */ > > > +/* { dg-final { scan-tree-dump " = 15;" "fre5" } } */ > > > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc > > > index 11061a374a2..242d82d6274 100644 > > > --- a/gcc/tree-ssa-sccvn.cc > > > +++ b/gcc/tree-ssa-sccvn.cc > > > @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, > > > void *data_, > > > if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias)) > > > return (void *)-1; > > > break; > > > + case IFN_LEN_MASK_STORE: > > > + len = gimple_call_arg (call, 2); > > > + bias = gimple_call_arg (call, 5); > > > + if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias)) > > > + return (void *)-1; > > > + mask = gimple_call_arg (call, internal_fn_mask_index (fn)); > > > + mask = vn_valueize (mask); > > > + if (TREE_CODE (mask) != VECTOR_CST) > > > + return (void *)-1; > > > + break; > > > default: > > > return (void *)-1; > > > } > > > @@ -3344,11 +3354,17 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, > > > void *data_, > > > tree vectype = TREE_TYPE (def_rhs); > > > unsigned HOST_WIDE_INT elsz > > > = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype))); > > > + /* Set initial len value is the UINT_MAX, so mask_idx < actual_len > > > + is always true for MASK_STORE. */ > > > + unsigned actual_len = UINT_MAX; > > > + if (len) > > > + actual_len = tree_to_uhwi (len) + tree_to_shwi (bias); > > > + unsigned nunits > > > + = MIN (actual_len, VECTOR_CST_NELTS (mask).coeffs[0]); > > > > No, that's not correct and what I meant. There's > > vector_cst_encoded_nelts (mask), but for example for an > > all-ones mask that would be 1. You'd then also not access > > VECTOR_CST_ELT but VECTOR_CST_ENCODED_ELT. What I'm not sure > > is how to recover the implicit walk over all elements for the > > purpose of computing start/length and how generally this will > > work for variable-length vectors where enumerating "all" > > elements touched is required for correctness. > > > > The most simple thing would be to make this all conditional > > to constant TYPE_VECTOR_SUBPARTS. Which is also why I was > > asking for VL vector testcases. > > > > Richard. > > > > > if (mask) > > > { > > > HOST_WIDE_INT start = 0, length = 0; > > > - unsigned mask_idx = 0; > > > - do > > > + for (unsigned mask_idx = 0; mask_idx < nunits; mask_idx++) > > > { > > > if (integer_zerop (VECTOR_CST_ELT (mask, mask_idx))) > > > { > > > @@ -3371,9 +3387,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void > > > *data_, > > > } > > > else > > > length += elsz; > > > - mask_idx++; > > > } > > > - while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype))); > > > if (length != 0) > > > { > > > pd.rhs_off = start; > > > @@ -3389,7 +3403,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void > > > *data_, > > > { > > > pd.offset = offset2i; > > > pd.size = (tree_to_uhwi (len) > > > - + -tree_to_shwi (bias)) * BITS_PER_UNIT; > > > + + tree_to_shwi (bias)) * BITS_PER_UNIT; > > > if (BYTES_BIG_ENDIAN) > > > pd.rhs_off = pd.size - tree_to_uhwi (TYPE_SIZE (vectype)); > > > else > > > > > > > > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)