Oluwatamilore Adebayo <oluwatamilore.adeb...@arm.com> writes: > From: oluade01 <oluwatamilore.adeb...@arm.com> > > This patch adds new RTL for ABDL (sabdl, sabdl2, uabdl, uabdl2). > > gcc/ChangeLog: > > * config/aarch64/aarch64-simd.md > (vec_widen_<su>abdl_lo_<mode>, vec_widen_<su>abdl_hi_<mode>): > Expansions for abd vec widen optabs. > (aarch64_<su>abdl<mode>_insn): VQW based abdl RTL. > * config/aarch64/iterators.md (USMAX_EXT): Code attributes > that give the appropriate extend RTL for the max RTL. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/abd_2.c: Added ABDL testcases. > * gcc.target/aarch64/abd_3.c: Added ABDL testcases. > * gcc.target/aarch64/abd_4.c: Added ABDL testcases. > * gcc.target/aarch64/abd_none_2.c: Added ABDL testcases. > * gcc.target/aarch64/abd_none_3.c: Added ABDL testcases. > * gcc.target/aarch64/abd_none_4.c: Added ABDL testcases. > * gcc.target/aarch64/abd_run_1.c: Added ABDL testcases. > * gcc.target/aarch64/sve/abd_1.c: Added ABDL testcases. > * gcc.target/aarch64/sve/abd_2.c: Added ABDL testcases. > * gcc.target/aarch64/sve/abd_none_1.c: Added ABDL testcases. > * gcc.target/aarch64/sve/abd_none_2.c: Added ABDL testcases. > --- > gcc/config/aarch64/aarch64-simd.md | 65 ++++++++++++++ > gcc/config/aarch64/iterators.md | 3 + > gcc/testsuite/gcc.target/aarch64/abd_2.c | 33 +++++--- > gcc/testsuite/gcc.target/aarch64/abd_3.c | 36 +++++--- > gcc/testsuite/gcc.target/aarch64/abd_4.c | 34 ++++---- > gcc/testsuite/gcc.target/aarch64/abd_none_2.c | 73 ++++++++++++++++ > gcc/testsuite/gcc.target/aarch64/abd_none_3.c | 73 ++++++++++++++++ > gcc/testsuite/gcc.target/aarch64/abd_none_4.c | 84 +++++++++++++++++++ > gcc/testsuite/gcc.target/aarch64/abd_run_1.c | 29 +++++++ > .../gcc.target/aarch64/abd_widen_2.c | 62 ++++++++++++++ > .../gcc.target/aarch64/abd_widen_3.c | 62 ++++++++++++++ > .../gcc.target/aarch64/abd_widen_4.c | 56 +++++++++++++ > gcc/testsuite/gcc.target/aarch64/sve/abd_1.c | 57 +++++++++++-- > gcc/testsuite/gcc.target/aarch64/sve/abd_2.c | 47 +++++++++-- > .../gcc.target/aarch64/sve/abd_none_1.c | 73 ++++++++++++++++ > .../gcc.target/aarch64/sve/abd_none_2.c | 80 ++++++++++++++++++ > 16 files changed, 811 insertions(+), 56 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/abd_widen_2.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/abd_widen_3.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/abd_widen_4.c > > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index > bf90202ba2ad3f62f2020486d21256f083effb07..9acf0ab3067a76c0ba49d61e2857558c8482e77d > 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -975,6 +975,71 @@ (define_expand "aarch64_<su>abdl2<mode>" > } > ) > > +(define_insn "aarch64_<su>abdl<mode>_hi_internal" > + [(set (match_operand:<VWIDE> 0 "register_operand" "=w") > + (abs:<VWIDE> > + (minus:<VWIDE> > + (ANY_EXTEND:<VWIDE> > + (vec_select:<VHALF> > + (match_operand:VQW 1 "register_operand" "w") > + (match_operand:VQW 3 "vect_par_cnst_hi_half" ""))) > + (ANY_EXTEND:<VWIDE> > + (vec_select:<VHALF> > + (match_operand:VQW 2 "register_operand" "w") > + (match_dup 3))))))] > + "TARGET_SIMD" > + "<su>abdl2\t%0.<Vwtype>, %1.<Vtype>, %2.<Vtype>" > + [(set_attr "type" "neon_abd_long")] > +) > + > +(define_insn "aarch64_<su>abdl<mode>_lo_internal" > + [(set (match_operand:<VWIDE> 0 "register_operand" "=w") > + (minus:<VWIDE> > + (USMAX:<VWIDE> > + (<USMAX_EXT>:<VWIDE> > + (vec_select:<VHALF> > + (match_operand:VQW 1 "register_operand" "w") > + (match_operand:VQW 3 "vect_par_cnst_lo_half" ""))) > + (<USMAX_EXT>:<VWIDE> > + (vec_select:<VHALF> > + (match_operand:VQW 2 "register_operand" "w") > + (match_dup 3)))) > + (<max_opp>:<VWIDE> > + (<USMAX_EXT>:<VWIDE> > + (vec_select:<VHALF> (match_dup 1) (match_dup 3))) > + (<USMAX_EXT>:<VWIDE> > + (vec_select:<VHALF> (match_dup 2) (match_dup 3))))))]
Sorry, my fault, but I meant the comment about avoiding (minus (max…) (min…)) for both patterns, not just the first. I think the review suggestions for 1/2 will change the tests. For example: TEST2(signed, short, char) shouldn't use IFN_WIDEN_ABD, since: .L2: ldr q30, [x5, x3] ldr q28, [x4, x3] ldr q31, [x0, x3] ldr q29, [x1, x3] add x3, x3, 32 sabd v30.8h, v30.8h, v28.8h sabd v31.8h, v31.8h, v29.8h uzp1 v31.16b, v31.16b, v30.16b str q31, [x2], 16 cmp x3, 2048 bne .L2 is better than: .L2: ldr q28, [x1, x3] ldr q29, [x0, x3] ldr q30, [x5, x3] ldr q27, [x4, x3] add x3, x3, 32 sabdl v31.4s, v29.4h, v28.4h sabdl2 v29.4s, v29.8h, v28.8h sabdl v28.4s, v30.4h, v27.4h sabdl2 v30.4s, v30.8h, v27.8h uzp1 v31.8h, v31.8h, v29.8h uzp1 v30.8h, v28.8h, v30.8h uzp1 v31.16b, v31.16b, v30.16b str q31, [x2], 16 cmp x3, 2048 bne .L2 LGTM with the tests updated to match. Thanks, Richard