Jonathan Wright <jonathan.wri...@arm.com> writes:
> Hi,
>
> As a general principle, vec_duplicate should be as close to the root
> of an expression as possible. Where unary operations have
> vec_duplicate as an argument, these operations should be pushed
> inside the vec_duplicate.
>
> This patch modifies unary operation simplification to push
> sign/zero-extension of a scalar inside vec_duplicate.
>
> This patch also updates all RTL patterns in aarch64-simd.md to use
> the new canonical form.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu and
> x86_64-none-linux-gnu - no issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-07-19  Jonathan Wright  <jonathan.wri...@arm.com>
>
>         * config/aarch64/aarch64-simd.md: Push sign/zero-extension
>         inside vec_duplicate for all patterns.
>         * simplify-rtx.c (simplify_context::simplify_unary_operation_1):
>         Push sign/zero-extension inside vec_duplicate.
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 74890989cb3045798bf8d0241467eaaf72238297..99a95a54248041906b9a0ad742d3a0dca9733b35
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -2092,14 +2092,14 @@
>  
>  (define_insn "aarch64_<su>mlal_hi_n<mode>_insn"
>    [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
> -        (plus:<VWIDE>
> -          (mult:<VWIDE>
> -              (ANY_EXTEND:<VWIDE> (vec_select:<VHALF>
> -                 (match_operand:VQ_HSI 2 "register_operand" "w")
> -                 (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" "")))
> -              (ANY_EXTEND:<VWIDE> (vec_duplicate:<VCOND>
> -                    (match_operand:<VEL> 4 "register_operand" "<h_con>"))))
> -          (match_operand:<VWIDE> 1 "register_operand" "0")))]
> +     (plus:<VWIDE>
> +       (mult:<VWIDE>
> +           (ANY_EXTEND:<VWIDE> (vec_select:<VHALF>
> +              (match_operand:VQ_HSI 2 "register_operand" "w")
> +              (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" "")))
> +              (vec_duplicate:<VWIDE> (ANY_EXTEND:<VWIDE_S>
> +              (match_operand:<VEL> 4 "register_operand" "<h_con>"))))
> +       (match_operand:<VWIDE> 1 "register_operand" "0")))]

Sorry to nitpick, since this is pre-existing, but I think the pattern
would be easier to read with one operation per line.  I.e.:

        (plus:<VWIDE>
          (mult:<VWIDE>
            (ANY_EXTEND:<VWIDE>
              (vec_select:<VHALF>
                (match_operand:VQ_HSI 2 "register_operand" "w")
                (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" "")))
            (vec_duplicate:<VWIDE>
              (ANY_EXTEND:<VWIDE_S>
                (match_operand:<VEL> 4 "register_operand" "<h_con>"))))
          (match_operand:<VWIDE> 1 "register_operand" "0")))]

Same for the other patterns with similar doubling of operators.
(It looks like you've fixed other indentation problems though, thanks.)

> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index 
> 2d169d3f9f70c85d396adaed124b6c52aca98f07..f885816412f7576d2535f827562d2b425a6a553b
>  100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -903,6 +903,18 @@ simplify_context::simplify_unary_operation_1 (rtx_code 
> code, machine_mode mode,
>    rtx temp, elt, base, step;
>    scalar_int_mode inner, int_mode, op_mode, op0_mode;
>  
> +  /* Extending a VEC_DUPLICATE of a scalar should be canonicalized to a
> +     VEC_DUPLICATE of an extended scalar. This is outside of the main switch
> +     as we may wish to push all unary operations inside VEC_DUPLICATE. */
> +  if ((code == SIGN_EXTEND || code == ZERO_EXTEND)
> +      && GET_CODE (op) == VEC_DUPLICATE
> +      && GET_MODE_NUNITS (GET_MODE (XEXP (op, 0))).to_constant () == 1)
> +    {
> +      rtx x = simplify_gen_unary (code, GET_MODE_INNER (mode),
> +                               XEXP (op, 0), GET_MODE (XEXP (op, 0)));
> +      return gen_vec_duplicate (mode, x);
> +    }
> +
>    switch (code)
>      {
>      case NOT:

This is really an extension of the existing code:

  if (VECTOR_MODE_P (mode)
      && vec_duplicate_p (op, &elt)
      && code != VEC_DUPLICATE)
    {
      /* Try applying the operator to ELT and see if that simplifies.
         We can duplicate the result if so.

         The reason we don't use simplify_gen_unary is that it isn't
         necessarily a win to convert things like:

           (neg:V (vec_duplicate:V (reg:S R)))

         to:

           (vec_duplicate:V (neg:S (reg:S R)))

         The first might be done entirely in vector registers while the
         second might need a move between register files.  */
      temp = simplify_unary_operation (code, GET_MODE_INNER (mode),
                                       elt, GET_MODE_INNER (GET_MODE (op)));
      if (temp)
        return gen_vec_duplicate (mode, temp);
    }

As we discussed off-list, the original motivation of this code was
to support folding operations on variable-length vectors.  We didn't
want to change the canonical form for everyone at the same time.

Now that we've agreed what the canonical form should be (on IRC),
the long-term goal should be change the simplify_unary_operation
into a simplify_gen_unary.  I understand you might not want to do
that yet, but I think we should at least integrate the new code
with the existing code.  E.g.:

      if (code == SIGN_EXTEND || code == ZERO_EXTEND)
        temp = simplify_gen_unary (code, GET_MODE_INNER (mode),
                                   elt, GET_MODE_INNER (GET_MODE (op)));
      else
        temp = simplify_unary_operation (code, GET_MODE_INNER (mode),
                                         elt, GET_MODE_INNER (GET_MODE (op)));

Thanks,
Richard

Reply via email to