Re: [PATCH 3/4]AArch64: add new alternative with early clobber to patterns

Richard Sandiford Thu, 30 May 2024 13:12:36 -0700

Tamar Christina <tamar.christ...@arm.com> writes:
> [...]
> @@ -6651,8 +6661,10 @@ (define_insn "and<mode>3"
>       (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand")
>                     (match_operand:PRED_ALL 2 "register_operand")))]
>    "TARGET_SVE"
> -  {@ [ cons: =0, 1  , 2   ]
> -     [ Upa     , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b
> +  {@ [ cons: =0, 1  , 2  ; attrs: pred_clobber ]
> +     [ &Upa    , Upa, Upa; yes                 ] and\t%0.b, %1/z, %2.b, %2.b
> +     [ ?Upa    , 0  , Upa; yes                 ] ^
> +     [ Upa     , Upa, Upa; no                  ] ^


I think this ought to be:

> +  {@ [ cons: =0, 1  ,  2   ; attrs: pred_clobber ]
> +     [ &Upa    , Upa,  Upa ; yes                 ] and\t%0.b, %1/z, %2.b, 
> %2.b
> +     [ ?Upa    , 0Upa, 0Upa; yes                 ] ^
> +     [ Upa     , Upa,  Upa ; no                  ] ^

so that operand 2 can be tied to operand 0 in the worst case.  Similarly:

>    }
>  )
>  
> @@ -6679,8 +6691,10 @@ (define_insn "@aarch64_pred_<optab><mode>_z"
>           (match_operand:PRED_ALL 3 "register_operand"))
>         (match_operand:PRED_ALL 1 "register_operand")))]
>    "TARGET_SVE"
> -  {@ [ cons: =0, 1  , 2  , 3   ]
> -     [ Upa     , Upa, Upa, Upa ] <logical>\t%0.b, %1/z, %2.b, %3.b
> +  {@ [ cons: =0, 1  , 2  , 3  ; attrs: pred_clobber ]
> +     [ &Upa    , Upa, Upa, Upa; yes                 ] <logical>\t%0.b, %1/z, 
> %2.b, %3.b
> +     [ ?Upa    , 0  , Upa, Upa; yes                 ] ^
> +     [ Upa     , Upa, Upa, Upa; no                  ] ^
>    }
>  )

this would be:

  {@ [ cons: =0, 1   , 2   , 3   ; attrs: pred_clobber ]
     [ &Upa    , Upa , Upa , Upa ; yes                 ] <logical>\t%0.b, %1/z, 
%2.b, %3.b
     [ ?Upa    , 0Upa, 0Upa, 0Upa; yes                 ] ^
     [ Upa     , Upa , Upa,  Upa ; no                  ] ^
  }

Same idea for the rest.

I tried this on:

----------------------------------------------------------------------
#include <arm_sve.h>

void use (svbool_t, svbool_t, svbool_t);

void
f1 (svbool_t p0, svbool_t p1, svbool_t p2, int n, svbool_t *ptr)
{
  while (n--)
    p2 = svand_z (p0, p1, p2);
  *ptr = p2;
}

void
f2 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t *ptr)
{
  *ptr = svand_z (p0, p1, p2);
}

void
f3 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t *ptr)
{
  use (svand_z (p0, p1, p2), p1, p2);
}

void
f4 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t *ptr)
{
  use (p0, svand_z (p0, p1, p2), p2);
}

void
f5 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t *ptr)
{
  use (p0, p1, svand_z (p0, p1, p2));
}
----------------------------------------------------------------------

and it seemed to produce the right output:

----------------------------------------------------------------------
f1:
        cbz     w0, .L2
        sub     w0, w0, #1
        .p2align 5,,15
.L3:
        and     p2.b, p0/z, p1.b, p2.b
        sub     w0, w0, #1
        cmn     w0, #1
        bne     .L3
.L2:
        str     p2, [x1]
        ret

f2:
        and     p3.b, p0/z, p1.b, p2.b
        str     p3, [x0]
        ret

f3:
        and     p0.b, p0/z, p1.b, p2.b
        b       use

f4:
        and     p1.b, p0/z, p1.b, p2.b
        b       use

f5:
        and     p2.b, p0/z, p1.b, p2.b
        b       use
----------------------------------------------------------------------

(with that coming directly from RA, rather than being cleaned
up later)

> [...]
> @@ -10046,8 +10104,10 @@ (define_insn_and_rewrite "*aarch64_brkn_cc"
>          (match_dup 3)]
>         UNSPEC_BRKN))]
>    "TARGET_SVE"
> -  {@ [ cons: =0, 1  , 2  , 3 ]
> -     [ Upa     , Upa, Upa, 0 ] brkns\t%0.b, %1/z, %2.b, %0.b
> +  {@ [ cons: =0, 1  , 2  , 3; attrs: pred_clobber ]
> +     [ &Upa    , Upa, Upa, 0; yes                 ] brkns\t%0.b, %1/z, %2.b, 
> %0.b
> +     [ ?Upa    , 0  , Upa, 0; yes                 ] ^
> +     [ Upa     , Upa, Upa, 0; no                  ] ^
>    }
>    "&& (operands[4] != CONST0_RTX (VNx16BImode)
>         || operands[5] != CONST0_RTX (VNx16BImode))"

Probably best to leave this out.  All alternatives require operand 3
to match operand 0.  So operands 1 and 2 will only match operand 0
if they're the same as operand 3.  In that case it'd be better to
allow the sharing rather than force the same value to be stored
in two registers.

That is, if op1 != op3 && op2 != op3 then we get what we want
naturally, regardless of tuning.

The same thing would apply to the BRKN instances of <brk_reg_con>:

> @@ -10020,8 +10076,10 @@ (define_insn "@aarch64_brk<brk_op>"
>          (match_operand:VNx16BI 3 "register_operand")]
>         SVE_BRK_BINARY))]
>    "TARGET_SVE"
> -  {@ [ cons: =0, 1  , 2  , 3             ]
> -     [ Upa     , Upa, Upa, <brk_reg_con> ] brk<brk_op>\t%0.b, %1/z, %2.b, 
> %<brk_reg_opno>.b
> +  {@ [ cons: =0,  1 , 2  , 3            ; attrs: pred_clobber ]
> +     [ &Upa    , Upa, Upa, <brk_reg_con>; yes                 ] 
> brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b
> +     [ ?Upa    , 0  , Upa, <brk_reg_con>; yes                 ] ^
> +     [ Upa     , Upa, Upa, <brk_reg_con>; no                  ] ^
>    }
>  )

but I think we should keep this factoring/abstraction and just add
the extra alternatives regardless.  I.e.:

  {@ [ cons: =0, 1   , 2   , 3             ; attrs: pred_clobber ]
     [ &Upa    , Upa , Upa , <brk_reg_con> ; yes                 ] 
brk<brk_op>\t%0.b, %1/z, %2.b, %<brk_reg_opno>.b
     [ ?Upa    , 0Upa, 0Upa, 0<brk_reg_con>; yes                 ] ^
     [ Upa     , Upa , Upa , <brk_reg_con> ; no                  ] ^

(even though this gives "00", which is valid but redundant).

OK with those changes, thanks.

Richard

Re: [PATCH 3/4]AArch64: add new alternative with early clobber to patterns

Reply via email to