Thanks, Just archiving a version with more tests as requested.
I will assume the OK still stands. Regards, Tamar > -----Original Message----- > From: Kyrylo Tkachov <kyrylo.tkac...@arm.com> > Sent: Tuesday, October 12, 2021 1:19 PM > To: Andrew Pinski <pins...@gmail.com> > Cc: Tamar Christina <tamar.christ...@arm.com>; gcc-patches@gcc.gnu.org; > apin...@marvell.com; Richard Earnshaw <richard.earns...@arm.com>; nd > <n...@arm.com>; Marcus Shawcroft <marcus.shawcr...@arm.com>; Richard > Sandiford <richard.sandif...@arm.com> > Subject: RE: [PATCH 3/7]AArch64 Add pattern for sshr to cmlt > > > > > -----Original Message----- > > From: Andrew Pinski <pins...@gmail.com> > > Sent: Monday, October 11, 2021 8:56 PM > > To: Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > Cc: Tamar Christina <tamar.christ...@arm.com>; > > gcc-patches@gcc.gnu.org; apin...@marvell.com; Richard Earnshaw > > <richard.earns...@arm.com>; nd <n...@arm.com>; Marcus Shawcroft > > <marcus.shawcr...@arm.com>; Richard Sandiford > > <richard.sandif...@arm.com> > > Subject: Re: [PATCH 3/7]AArch64 Add pattern for sshr to cmlt > > > > On Thu, Sep 30, 2021 at 2:28 AM Kyrylo Tkachov via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > -----Original Message----- > > > > From: Tamar Christina <tamar.christ...@arm.com> > > > > Sent: Wednesday, September 29, 2021 5:20 PM > > > > To: gcc-patches@gcc.gnu.org > > > > Cc: nd <n...@arm.com>; Richard Earnshaw > > <richard.earns...@arm.com>; > > > > Marcus Shawcroft <marcus.shawcr...@arm.com>; Kyrylo Tkachov > > > > <kyrylo.tkac...@arm.com>; Richard Sandiford > > > > <richard.sandif...@arm.com> > > > > Subject: [PATCH 3/7]AArch64 Add pattern for sshr to cmlt > > > > > > > > Hi All, > > > > > > > > This optimizes signed right shift by BITSIZE-1 into a cmlt > > > > operation which > > is > > > > more optimal because generally compares have a higher throughput > > > > than shifts. > > > > > > > > On AArch64 the result of the shift would have been either -1 or 0 > > > > which is > > the > > > > results of the compare. > > > > > > > > i.e. > > > > > > > > void e (int * restrict a, int *b, int n) { > > > > for (int i = 0; i < n; i++) > > > > b[i] = a[i] >> 31; > > > > } > > > > > > > > now generates: > > > > > > > > .L4: > > > > ldr q0, [x0, x3] > > > > cmlt v0.4s, v0.4s, #0 > > > > str q0, [x1, x3] > > > > add x3, x3, 16 > > > > cmp x4, x3 > > > > bne .L4 > > > > > > > > instead of: > > > > > > > > .L4: > > > > ldr q0, [x0, x3] > > > > sshr v0.4s, v0.4s, 31 > > > > str q0, [x1, x3] > > > > add x3, x3, 16 > > > > cmp x4, x3 > > > > bne .L4 > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > > > > > Ok for master? > > > > > > This should be okay (either a win or neutral) for Arm Cortex and > > > Neoverse > > cores so I'm tempted to not ask for a CPU-specific tunable to guard it > > to keep the code clean. > > > Andrew, would this change be okay from a Thunder X line perspective? > > > > I don't know about ThunderX2 but here are the details for ThunderX1 > > (and OcteonX1) and OcteonX2: > > The sshr and cmlt are handled the same in the pipeline as far as I can tell. > > > > Thanks for the info. > This patch is ok. > Kyrill > > > Thanks, > > Andrew > > > > > > > > > Thanks, > > > Kyrill > > > > > > > > > > > Thanks, > > > > Tamar > > > > > > > > gcc/ChangeLog: > > > > > > > > * config/aarch64/aarch64-simd.md (aarch64_simd_ashr<mode>): > > > > Add case cmp > > > > case. > > > > * config/aarch64/constraints.md (D1): New. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > * gcc.target/aarch64/shl-combine-2.c: New test. > > > > > > > > --- inline copy of patch -- > > > > diff --git a/gcc/config/aarch64/aarch64-simd.md > > > > b/gcc/config/aarch64/aarch64-simd.md > > > > index > > > > > > 300bf001b59ca7fa197c580b10adb7f70f20d1e0..19b2d0ad4dab4d574269829 > > > > 7ded861228ee22007 100644 > > > > --- a/gcc/config/aarch64/aarch64-simd.md > > > > +++ b/gcc/config/aarch64/aarch64-simd.md > > > > @@ -1127,12 +1127,14 @@ (define_insn "aarch64_simd_lshr<mode>" > > > > ) > > > > > > > > (define_insn "aarch64_simd_ashr<mode>" > > > > - [(set (match_operand:VDQ_I 0 "register_operand" "=w") > > > > - (ashiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w") > > > > - (match_operand:VDQ_I 2 "aarch64_simd_rshift_imm" > > > > "Dr")))] > > > > + [(set (match_operand:VDQ_I 0 "register_operand" "=w,w") > > > > + (ashiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" > "w,w") > > > > + (match_operand:VDQ_I 2 "aarch64_simd_rshift_imm" > > > > "D1,Dr")))] > > > > "TARGET_SIMD" > > > > - "sshr\t%0.<Vtype>, %1.<Vtype>, %2" > > > > - [(set_attr "type" "neon_shift_imm<q>")] > > > > + "@ > > > > + cmlt\t%0.<Vtype>, %1.<Vtype>, #0 sshr\t%0.<Vtype>, %1.<Vtype>, > > > > + %2" > > > > + [(set_attr "type" "neon_compare<q>,neon_shift_imm<q>")] > > > > ) > > > > > > > > (define_insn "*aarch64_simd_sra<mode>" > > > > diff --git a/gcc/config/aarch64/constraints.md > > > > b/gcc/config/aarch64/constraints.md > > > > index > > > > > > 3b49b452119c49320020fa9183314d9a25b92491..18630815ffc13f2168300a89 > > > > 9db69fd428dfb0d6 100644 > > > > --- a/gcc/config/aarch64/constraints.md > > > > +++ b/gcc/config/aarch64/constraints.md > > > > @@ -437,6 +437,14 @@ (define_constraint "Dl" > > > > (match_test "aarch64_simd_shift_imm_p (op, GET_MODE (op), > > > > true)"))) > > > > > > > > +(define_constraint "D1" > > > > + "@internal > > > > + A constraint that matches vector of immediates that is bits(mode)-1." > > > > + (and (match_code "const,const_vector") > > > > + (match_test "aarch64_const_vec_all_same_in_range_p (op, > > > > + GET_MODE_UNIT_BITSIZE (mode) - 1, > > > > + GET_MODE_UNIT_BITSIZE (mode) - 1)"))) > > > > + > > > > (define_constraint "Dr" > > > > "@internal > > > > A constraint that matches vector of immediates for right shifts." > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c > > > > b/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c > > > > new file mode 100644 > > > > index > > > > > > 0000000000000000000000000000000000000000..bdfe35d09ffccc7928947c9e > > > > 57f1034f7ca2c798 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.target/aarch64/shl-combine-2.c > > > > @@ -0,0 +1,12 @@ > > > > +/* { dg-do assemble } */ > > > > +/* { dg-options "-O3 --save-temps > > > > +--param=vect-epilogues-nomask=0" } > > */ > > > > + > > > > +void e (int * restrict a, int *b, int n) { > > > > + for (int i = 0; i < n; i++) > > > > + b[i] = a[i] >> 31; > > > > +} > > > > + > > > > +/* { dg-final { scan-assembler-times {\tcmlt\t} 1 } } */ > > > > +/* { dg-final { scan-assembler-not {\tsshr\t} } } */ > > > > + > > > > > > > > > > > > --
rb14894.patch
Description: rb14894.patch