Re: [PATCH V2] aarch64: Don't include vec_select in SIMD multiply cost

Jonathan Wright via Gcc-patches Wed, 28 Jul 2021 06:35:13 -0700

Hi,

V2 of the patch addresses the initial review comments, factors out
common code (as we discussed off-list) and adds a set of unit tests
to verify the code generation benefit.


Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-19  Jonathan Wright  <jonathan.wri...@arm.com>

        * config/aarch64/aarch64.c (aarch64_strip_duplicate_vec_elt):
        Define.
        (aarch64_rtx_mult_cost): Traverse RTL tree to prevent
        vec_select cost from being added into Neon multiply cost.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/vmul_element_cost.c: New test.



From: Richard Sandiford <richard.sandif...@arm.com>
Sent: 22 July 2021 18:16
To: Jonathan Wright <jonathan.wri...@arm.com>
Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov 
<kyrylo.tkac...@arm.com>
Subject: Re: [PATCH] aarch64: Don't include vec_select in SIMD multiply cost 
 
Jonathan Wright <jonathan.wri...@arm.com> writes:
> Hi,
>
> The Neon multiply/multiply-accumulate/multiply-subtract instructions
> can take various forms - multiplying full vector registers of values
> or multiplying one vector by a single element of another. Regardless
> of the form used, these instructions have the same cost, and this
> should be reflected by the RTL cost function.
>
> This patch adds RTL tree traversal in the Neon multiply cost function
> to match the vec_select used by the lane-referencing forms of the
> instructions already mentioned. This traversal prevents the cost of
> the vec_select from being added into the cost of the multiply -
> meaning that these instructions can now be emitted in the combine
> pass as they are no longer deemed prohibitively expensive.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-07-19  Jonathan Wright  <jonathan.wri...@arm.com>
>
>         * config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Traverse
>         RTL tree to prevents vec_select from being added into Neon
>         multiply cost.
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> f5b25a7f7041645921e6ad85714efda73b993492..b368303b0e699229266e6d008e28179c496bf8cd
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -11985,6 +11985,21 @@ aarch64_rtx_mult_cost (rtx x, enum rtx_code code, 
> int outer, bool speed)
>            op0 = XEXP (op0, 0);
>          else if (GET_CODE (op1) == VEC_DUPLICATE)
>            op1 = XEXP (op1, 0);
> +       /* The same argument applies to the VEC_SELECT when using the lane-
> +          referencing forms of the MUL/MLA/MLS instructions. Without the
> +          traversal here, the combine pass deems these patterns too
> +          expensive and subsequently does not emit the lane-referencing
> +          forms of the instructions. In addition, canonical form is for the
> +          VEC_SELECT to be the second argument of the multiply - thus only
> +          op1 is traversed.  */
> +       if (GET_CODE (op1) == VEC_SELECT
> +           && GET_MODE_NUNITS (GET_MODE (op1)).to_constant () == 1)
> +         op1 = XEXP (op1, 0);
> +       else if ((GET_CODE (op1) == ZERO_EXTEND
> +                 || GET_CODE (op1) == SIGN_EXTEND)
> +                && GET_CODE (XEXP (op1, 0)) == VEC_SELECT
> +                && GET_MODE_NUNITS (GET_MODE (op1)).to_constant () == 1)
> +         op1 = XEXP (XEXP (op1, 0), 0);

I think this logically belongs in the “GET_CODE (op1) == VEC_DUPLICATE”
if block, since the condition is never true otherwise.  We can probably
skip the GET_MODE_NUNITS tests, but if you'd prefer to keep them, I think
it would be better to add them to the existing VEC_DUPLICATE tests rather
than restrict them to the VEC_SELECT ones.

Also, although this is in Advanced SIMD-specific code, I think it'd be
better to use:

  is_a<scalar_mode> (GET_MODE (op1))

instead of:

  GET_MODE_NUNITS (GET_MODE (op1)).to_constant () == 1

Do you have a testcase?

Thanks,
Richard

rb14675.patch
Description: rb14675.patch

Re: [PATCH V2] aarch64: Don't include vec_select in SIMD multiply cost

Reply via email to