Oluwatamilore Adebayo <oluwatamilore.adeb...@arm.com> writes:
> From: oluade01 <oluwatamilore.adeb...@arm.com>
>
> This patch adds new RTL for ABDL (sabdl, sabdl2, uabdl, uabdl2).
>
> gcc/ChangeLog:
>
>       * config/aarch64/aarch64-simd.md
>       (vec_widen_<su>abdl_lo_<mode>, vec_widen_<su>abdl_hi_<mode>):
>       Expansions for abd vec widen optabs.
>       (aarch64_<su>abdl<mode>_insn): VQW based abdl RTL.
>       * config/aarch64/iterators.md (USMAX_EXT): Code attributes
>       that give the appropriate extend RTL for the max RTL.
>
> gcc/testsuite/ChangeLog:
>
>       * gcc.target/aarch64/abd_2.c: Added ABDL testcases.
>       * gcc.target/aarch64/abd_3.c: Added ABDL testcases.
>       * gcc.target/aarch64/abd_4.c: Added ABDL testcases.
>       * gcc.target/aarch64/abd_none_2.c: Added ABDL testcases.
>       * gcc.target/aarch64/abd_none_3.c: Added ABDL testcases.
>       * gcc.target/aarch64/abd_none_4.c: Added ABDL testcases.
>       * gcc.target/aarch64/abd_run_1.c: Added ABDL testcases.
>       * gcc.target/aarch64/sve/abd_1.c: Added ABDL testcases.
>       * gcc.target/aarch64/sve/abd_2.c: Added ABDL testcases.
>       * gcc.target/aarch64/sve/abd_none_1.c: Added ABDL testcases.
>       * gcc.target/aarch64/sve/abd_none_2.c: Added ABDL testcases.
> ---
>  gcc/config/aarch64/aarch64-simd.md            | 65 ++++++++++++++
>  gcc/config/aarch64/iterators.md               |  3 +
>  gcc/testsuite/gcc.target/aarch64/abd_2.c      | 33 +++++---
>  gcc/testsuite/gcc.target/aarch64/abd_3.c      | 36 +++++---
>  gcc/testsuite/gcc.target/aarch64/abd_4.c      | 34 ++++----
>  gcc/testsuite/gcc.target/aarch64/abd_none_2.c | 73 ++++++++++++++++
>  gcc/testsuite/gcc.target/aarch64/abd_none_3.c | 73 ++++++++++++++++
>  gcc/testsuite/gcc.target/aarch64/abd_none_4.c | 84 +++++++++++++++++++
>  gcc/testsuite/gcc.target/aarch64/abd_run_1.c  | 29 +++++++
>  .../gcc.target/aarch64/abd_widen_2.c          | 62 ++++++++++++++
>  .../gcc.target/aarch64/abd_widen_3.c          | 62 ++++++++++++++
>  .../gcc.target/aarch64/abd_widen_4.c          | 56 +++++++++++++
>  gcc/testsuite/gcc.target/aarch64/sve/abd_1.c  | 57 +++++++++++--
>  gcc/testsuite/gcc.target/aarch64/sve/abd_2.c  | 47 +++++++++--
>  .../gcc.target/aarch64/sve/abd_none_1.c       | 73 ++++++++++++++++
>  .../gcc.target/aarch64/sve/abd_none_2.c       | 80 ++++++++++++++++++
>  16 files changed, 811 insertions(+), 56 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/abd_widen_2.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/abd_widen_3.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/abd_widen_4.c
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> bf90202ba2ad3f62f2020486d21256f083effb07..9acf0ab3067a76c0ba49d61e2857558c8482e77d
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -975,6 +975,71 @@ (define_expand "aarch64_<su>abdl2<mode>"
>    }
>  )
>  
> +(define_insn "aarch64_<su>abdl<mode>_hi_internal"
> +  [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
> +     (abs:<VWIDE>
> +       (minus:<VWIDE>
> +         (ANY_EXTEND:<VWIDE>
> +           (vec_select:<VHALF>
> +             (match_operand:VQW 1 "register_operand" "w")
> +             (match_operand:VQW 3 "vect_par_cnst_hi_half" "")))
> +         (ANY_EXTEND:<VWIDE>
> +           (vec_select:<VHALF>
> +             (match_operand:VQW 2 "register_operand" "w")
> +             (match_dup 3))))))]
> +  "TARGET_SIMD"
> +  "<su>abdl2\t%0.<Vwtype>, %1.<Vtype>, %2.<Vtype>"
> +  [(set_attr "type" "neon_abd_long")]
> +)
> +
> +(define_insn "aarch64_<su>abdl<mode>_lo_internal"
> +  [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
> +     (minus:<VWIDE>
> +       (USMAX:<VWIDE>
> +         (<USMAX_EXT>:<VWIDE>
> +           (vec_select:<VHALF>
> +             (match_operand:VQW 1 "register_operand" "w")
> +             (match_operand:VQW 3 "vect_par_cnst_lo_half" "")))
> +         (<USMAX_EXT>:<VWIDE>
> +           (vec_select:<VHALF>
> +             (match_operand:VQW 2 "register_operand" "w")
> +             (match_dup 3))))
> +       (<max_opp>:<VWIDE>
> +         (<USMAX_EXT>:<VWIDE>
> +           (vec_select:<VHALF> (match_dup 1) (match_dup 3)))
> +         (<USMAX_EXT>:<VWIDE>
> +           (vec_select:<VHALF> (match_dup 2) (match_dup 3))))))]

Sorry, my fault, but I meant the comment about avoiding
(minus (max…) (min…)) for both patterns, not just the first.

I think the review suggestions for 1/2 will change the tests.
For example:

TEST2(signed, short, char)

shouldn't use IFN_WIDEN_ABD, since:

.L2:
        ldr     q30, [x5, x3]
        ldr     q28, [x4, x3]
        ldr     q31, [x0, x3]
        ldr     q29, [x1, x3]
        add     x3, x3, 32
        sabd    v30.8h, v30.8h, v28.8h
        sabd    v31.8h, v31.8h, v29.8h
        uzp1    v31.16b, v31.16b, v30.16b
        str     q31, [x2], 16
        cmp     x3, 2048
        bne     .L2
 
is better than:

.L2:
        ldr     q28, [x1, x3]
        ldr     q29, [x0, x3]
        ldr     q30, [x5, x3]
        ldr     q27, [x4, x3]
        add     x3, x3, 32
        sabdl   v31.4s, v29.4h, v28.4h
        sabdl2  v29.4s, v29.8h, v28.8h
        sabdl   v28.4s, v30.4h, v27.4h
        sabdl2  v30.4s, v30.8h, v27.8h
        uzp1    v31.8h, v31.8h, v29.8h
        uzp1    v30.8h, v28.8h, v30.8h
        uzp1    v31.16b, v31.16b, v30.16b
        str     q31, [x2], 16
        cmp     x3, 2048
        bne     .L2

LGTM with the tests updated to match.

Thanks,
Richard

Reply via email to