Thank you Biener for the professional suggestion. That make it more clean and simple, and update the PATCH v2 for this already.
Pan -----Original Message----- From: Richard Biener <richard.guent...@gmail.com> Sent: Wednesday, March 29, 2023 4:10 PM To: Li, Pan2 <pan2...@intel.com> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; rguent...@suse.de; Wang, Yanzhang <yanzhang.w...@intel.com> Subject: Re: [PATCH] [RISC-V]: Bugfix for RVV vbool*_t vn_reference_equal. On Wed, Mar 29, 2023 at 9:55 AM Pan Li via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > From: Pan Li <pan2...@intel.com> > > In most architecture the precision_size of vbool*_t types are > caculated like as the multiple of the type size. For example: > precision_size = type_size * 8 (aka, bit count per bytes). > > Unfortunately, some architecture like RISC-V will adjust the > precision_size for the vbool*_t in order to align the ISA. For example as > below. > type_size = [1, 1, 1, 1, 2, 4, 8] > precision_size = [1, 2, 4, 8, 16, 32, 64] > > Then the precision_size of RISC-V vbool*_t will not be the multiple of > the type_size. This PATCH try to enrich this case when comparing the > vn_reference. > > Given we have the below code: > void test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) { > vbool8_t v1 = *(vbool8_t*)in; > vbool16_t v2 = *(vbool16_t*)in; > > *(vbool8_t*)(out + 100) = v1; > *(vbool16_t*)(out + 200) = v2; > } > > Before this PATCH: > csrr t0,vlenb > slli t1,t0,1 > csrr a3,vlenb > sub sp,sp,t1 > slli a4,a3,1 > add a4,a4,sp > addi a2,a1,100 > vsetvli a5,zero,e8,m1,ta,ma > sub a3,a4,a3 > vlm.v v24,0(a0) > vsm.v v24,0(a2) > vsm.v v24,0(a3) > addi a1,a1,200 > csrr t0,vlenb > vsetvli a4,zero,e8,mf2,ta,ma > slli t1,t0,1 > vlm.v v24,0(a3) > vsm.v v24,0(a1) > add sp,sp,t1 > jr ra > > After this PATCH: > addi a3,a1,100 > vsetvli a4,zero,e8,m1,ta,ma > addi a1,a1,200 > vlm.v v24,0(a0) > vsm.v v24,0(a3) > vsetvli a5,zero,e8,mf2,ta,ma > vlm.v v24,0(a0) > vsm.v v24,0(a1) > ret > > PR 109272 > > gcc/ChangeLog: > > * tree-ssa-sccvn.cc (vn_reference_eq): > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/base/pr108185-4.c: > * gcc.target/riscv/rvv/base/pr108185-5.c: > * gcc.target/riscv/rvv/base/pr108185-6.c: > --- > .../gcc.target/riscv/rvv/base/pr108185-4.c | 2 +- > .../gcc.target/riscv/rvv/base/pr108185-5.c | 2 +- > .../gcc.target/riscv/rvv/base/pr108185-6.c | 2 +- > gcc/tree-ssa-sccvn.cc | 13 +++++++++++++ > 4 files changed, 16 insertions(+), 3 deletions(-) > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-4.c > b/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-4.c > index ea3c360d756..e70284fada8 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-4.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-4.c > @@ -65,4 +65,4 @@ test_vbool8_then_vbool64(int8_t * restrict in, > int8_t * restrict out) { > /* { dg-final { scan-assembler-times > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ > /* { dg-final { scan-assembler-times > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ > /* { dg-final { scan-assembler-times > {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > -/* { dg-final { scan-assembler-times > {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */ > +/* { dg-final { scan-assembler-times > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-5.c > b/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-5.c > index 9fc659d2402..575a7842cdf 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-5.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-5.c > @@ -65,4 +65,4 @@ test_vbool16_then_vbool64(int8_t * restrict in, > int8_t * restrict out) { > /* { dg-final { scan-assembler-times > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */ > /* { dg-final { scan-assembler-times > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ > /* { dg-final { scan-assembler-times > {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > -/* { dg-final { scan-assembler-times > {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 14 } } */ > +/* { dg-final { scan-assembler-times > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-6.c > b/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-6.c > index 98275e5267d..95a11d37016 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-6.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-6.c > @@ -65,4 +65,4 @@ test_vbool32_then_vbool64(int8_t * restrict in, > int8_t * restrict out) { > /* { dg-final { scan-assembler-times > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma} 1 } } */ > /* { dg-final { scan-assembler-times > {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */ > /* { dg-final { scan-assembler-times > {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > -/* { dg-final { scan-assembler-times > {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 13 } } */ > +/* { dg-final { scan-assembler-times > +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */ > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc index > 6b8d38b270c..c6dfa8fd9a3 100644 > --- a/gcc/tree-ssa-sccvn.cc > +++ b/gcc/tree-ssa-sccvn.cc > @@ -799,6 +799,19 @@ vn_reference_eq (const_vn_reference_t const vr1, > const_vn_reference_t const vr2) > && (TYPE_PRECISION (vr2->type) > != TREE_INT_CST_LOW (TYPE_SIZE (vr2->type)))) > return false; > + else if (VECTOR_BOOLEAN_TYPE_P (vr1->type) > + && VECTOR_BOOLEAN_TYPE_P (vr2->type) > + && expressions_equal_p (TYPE_SIZE (vr1->type), > + TYPE_SIZE (vr2->type)) > + && TYPE_PRECISION (vr1->type) != TYPE_PRECISION (vr2->type)) > + /* For the vbool*_t types in most architectures, the precision size is > + caculated as the multiple of the type size. For example, > + precision_size = type_size * 8 (aka, bit count per bytes). > + Unfortunately, some architecture like RISC-V will adjust the > + precision for the vbool*_t in order to align the ISA and the > + precision will not be the multiple of the type size. Thus, enrich > + the type comparation for this case here. */ > + return false; I would prefer if you write it cheaper, like else if (VECTOR_BOOLEAN_TYPE_P (vr1->type) && VECTOR_BOOLEAN_TYPE_P (vr2->type)) { /* Vector boolean types can have padding, verify we are dealing with the same number of elements. */ if (TYPE_VECTOR_SUBPARTS (vr1->type) != TYPE_VECTOR_SUBPARTS (vr2->type)) return false; } there is IMHO no reason to compare TYPE_SIZE again. Richard. > > i = 0; > j = 0; > -- > 2.34.1 >