Passed the X86 bootstrap and regression tests. Pan
-----Original Message----- From: Li, Pan2 <pan2...@intel.com> Sent: Tuesday, April 18, 2023 5:12 PM To: Li, Pan2 <pan2...@intel.com>; Richard Biener <richard.guent...@gmail.com> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; rguent...@suse.de; Wang, Yanzhang <yanzhang.w...@intel.com>; richard.sandif...@arm.com Subject: RE: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion Update the PATCH v2 here, https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615937.html. Running the boostrap/regression test, and keep you posted. Pan -----Original Message----- From: Gcc-patches <gcc-patches-bounces+pan2.li=intel....@gcc.gnu.org> On Behalf Of Li, Pan2 via Gcc-patches Sent: Tuesday, April 18, 2023 4:20 PM To: Richard Biener <richard.guent...@gmail.com> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; rguent...@suse.de; Wang, Yanzhang <yanzhang.w...@intel.com>; richard.sandif...@arm.com Subject: RE: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion I look into the IOR simplification code for this optimization. Mostly I try to implement them with generic vector operations. Pan -----Original Message----- From: Richard Biener <richard.guent...@gmail.com> Sent: Tuesday, April 18, 2023 4:01 PM To: Li, Pan2 <pan2...@intel.com> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; rguent...@suse.de; Wang, Yanzhang <yanzhang.w...@intel.com>; richard.sandif...@arm.com Subject: Re: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion On Tue, Apr 18, 2023 at 9:59 AM Richard Biener <richard.guent...@gmail.com> wrote: > > On Tue, Apr 18, 2023 at 3:31 AM Li, Pan2 via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > Passed the X86 bootstrap and regression tests. > > > > Pan > > > > -----Original Message----- > > From: Li, Pan2 <pan2...@intel.com> > > Sent: Monday, April 17, 2023 10:50 PM > > To: gcc-patches@gcc.gnu.org > > Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; rguent...@suse.de; > > Li, Pan2 <pan2...@intel.com>; Wang, Yanzhang > > <yanzhang.w...@intel.com>; richard.sandif...@arm.com > > Subject: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion > > > > From: Pan Li <pan2...@intel.com> > > > > This patch add the optimization for the vector IOR(V1, NOT V1). Assume we > > have below sample code. > > > > vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t vl) { > > return __riscv_vmorn_mm_b32(v1, v1, vl); } Btw, this also shows you might want to consider inlining the intrinsics in target specific folding or implement them with generic vector operations in the headers. > > > > Before this patch: > > vsetvli a5,zero,e8,mf4,ta,ma > > vlm.v v24,0(a1) > > vsetvli zero,a2,e8,mf4,ta,ma > > vmorn.mm v24,v24,v24 > > vsetvli a5,zero,e8,mf4,ta,ma > > vsm.v v24,0(a0) > > ret > > > > After this patch: > > vsetvli zero,a2,e8,mf4,ta,ma > > vmset.m v24 > > vsetvli a5,zero,e8,mf4,ta,ma > > vsm.v v24,0(a0) > > ret > > > > Or in RTL's perspective, > > from: > > (ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 > > [ v1 ]))) > > to: > > (const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ]) > > > > The similar optimization like VMANDN has enabled already. There should be > > no difference execpt the operator when compare the VMORN and VMANDN for > > such kind of optimization. The patch allows the VECTOR_BOOL IOR(V1, NOT V1) > > simplification besides the existing SCALAR_INT mode. > > > > gcc/ChangeLog: > > > > * machmode.h (VECTOR_BOOL_MODE_P): > > * simplify-rtx.cc (valid_mode_for_ior_simplification_p): > > (simplify_context::simplify_binary_operation_1): > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/rvv/base/mask_insn_shortcut.c: > > * gcc.target/riscv/simplify_ior_optimization.c: New test. > > > > Signed-off-by: Pan Li <pan2...@intel.com> > > --- > > gcc/machmode.h | 4 ++ > > gcc/simplify-rtx.cc | 10 +++- > > .../riscv/rvv/base/mask_insn_shortcut.c | 3 +- > > .../riscv/simplify_ior_optimization.c | 50 +++++++++++++++++++ > > 4 files changed, 63 insertions(+), 4 deletions(-) create mode > > 100644 gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c > > > > diff --git a/gcc/machmode.h b/gcc/machmode.h index > > f1865c1ef42..771bae89cb7 100644 > > --- a/gcc/machmode.h > > +++ b/gcc/machmode.h > > @@ -134,6 +134,10 @@ extern const unsigned char > > mode_class[NUM_MACHINE_MODES]; > > || GET_MODE_CLASS (MODE) == MODE_VECTOR_ACCUM \ > > || GET_MODE_CLASS (MODE) == MODE_VECTOR_UACCUM) > > > > +/* Nonzero if MODE is a vector bool mode. */ > > +#define VECTOR_BOOL_MODE_P(MODE) \ > > + (GET_MODE_CLASS (MODE) == MODE_VECTOR_BOOL) \ > > + > > /* Nonzero if MODE is a scalar integral mode. */ > > #define SCALAR_INT_MODE_P(MODE) \ > > (GET_MODE_CLASS (MODE) == MODE_INT \ > > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index > > ee75079917f..eff27b835bf 100644 > > --- a/gcc/simplify-rtx.cc > > +++ b/gcc/simplify-rtx.cc > > @@ -57,6 +57,12 @@ neg_poly_int_rtx (machine_mode mode, const_rtx i) > > return immed_wide_int_const (-wi::to_poly_wide (i, mode), mode); > > } > > > > +static bool > > +valid_mode_for_ior_simplification_p (machine_mode mode) { > > + return SCALAR_INT_MODE_P (mode) || VECTOR_BOOL_MODE_P (mode); } > > + > > /* Test whether expression, X, is an immediate constant that represents > > the most significant bit of machine mode MODE. */ > > > > @@ -3332,8 +3338,8 @@ simplify_context::simplify_binary_operation_1 > > (rtx_code code, > > if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1)) > > || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0))) > > && ! side_effects_p (op0) > > - && SCALAR_INT_MODE_P (mode)) > > - return constm1_rtx; > > + && valid_mode_for_ior_simplification_p (mode)) > > for simple predicates like this please do not split them out, it makes > understanding the code more difficult. > > > + return CONST1_RTX (mode); > > shouldn't this be CONSTM1_RTX (mode)? Why is this only valid for > VECTOR_BOOL and not also for VECTOR_INT? You're citing AND and that > does > > /* A & (~A) -> 0 */ > if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1)) > || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0))) > && ! side_effects_p (op0) > && GET_MODE_CLASS (mode) != MODE_CC) > return CONST0_RTX (mode); > > so why differ and not use the same GET_MODE_CLASS (mode) != MODE_CC condition? > > Richard. > > > > > /* (ior A C) is C if all bits of A that might be nonzero are on in > > C. */ > > if (CONST_INT_P (op1) > > diff --git > > a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c > > b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c > > index 83cc4a1b5a5..57d0241675a 100644 > > --- a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c > > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c > > @@ -233,9 +233,8 @@ vbool64_t > > test_shortcut_for_riscv_vmxnor_case_6(vbool64_t v1, size_t vl) { > > /* { dg-final { scan-assembler-not {vmxor\.mm\s+v[0-9]+,\s*v[0-9]+} > > } } */ > > /* { dg-final { scan-assembler-not {vmor\.mm\s+v[0-9]+,\s*v[0-9]+} > > } } */ > > /* { dg-final { scan-assembler-not {vmnor\.mm\s+v[0-9]+,\s*v[0-9]+} > > } } */ > > -/* { dg-final { scan-assembler-times > > {vmorn\.mm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 7 } } */ > > /* { dg-final { scan-assembler-not > > {vmxnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */ > > /* { dg-final { scan-assembler-times {vmclr\.m\s+v[0-9]+} 14 } } */ > > -/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 7 } } */ > > +/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 14 } } */ > > /* { dg-final { scan-assembler-times {vmmv\.m\s+v[0-9]+,\s*v[0-9]+} > > 14 } } */ > > /* { dg-final { scan-assembler-times > > {vmnot\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */ diff --git > > a/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c > > b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c > > new file mode 100644 > > index 00000000000..ec3bd0baf03 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c > > @@ -0,0 +1,50 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-march=rv64gc -mabi=lp64 -O2" } */ > > + > > +#include <stdint.h> > > + > > +uint8_t test_simplify_ior_scalar_case_0 (uint8_t a) { > > + return a | ~a; > > +} > > + > > +uint16_t test_simplify_ior_scalar_case_1 (uint16_t a) { > > + return a | ~a; > > +} > > + > > +uint32_t test_simplify_ior_scalar_case_2 (uint32_t a) { > > + return a | ~a; > > +} > > + > > +uint64_t test_simplify_ior_scalar_case_3 (uint64_t a) { > > + return a | ~a; > > +} > > + > > +int8_t test_simplify_ior_scalar_case_4 (int8_t a) { > > + return a | ~a; > > +} > > + > > +int16_t test_simplify_ior_scalar_case_5 (int16_t a) { > > + return a | ~a; > > +} > > + > > +int32_t test_simplify_ior_scalar_case_6 (int32_t a) { > > + return a | ~a; > > +} > > + > > +int64_t test_simplify_ior_scalar_case_7 (int64_t a) { > > + return a | ~a; > > +} > > + > > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*-1} 6 } } */ > > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*255} 1 } } */ > > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*65536} 1 } } > > +*/ > > +/* { dg-final { scan-assembler-not {or\s+a[0-9]+} } } */ > > +/* { dg-final { scan-assembler-not {not\s+a[0-9]+} } } */ > > -- > > 2.34.1 > >