> -----Original Message-----
> From: Andrew Pinski <pins...@gmail.com>
> Sent: Thursday, October 27, 2022 4:15 AM
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> <richard.earns...@arm.com>; nd <n...@arm.com>; Richard Sandiford
> <richard.sandif...@arm.com>; Marcus Shawcroft
> <marcus.shawcr...@arm.com>
> Subject: Re: [PATCH 2/3]AArch64 Promote function arguments using a
> paradoxical subreg when beneficial.
> 
> On Fri, May 13, 2022 at 10:14 AM Tamar Christina via Gcc-patches <gcc-
> patc...@gcc.gnu.org> wrote:
> >
> > Hi All,
> >
> > The PROMOTE_MODE always promotes 8 and 16-bit parameters to 32-bits.
> > This promotion is not required for the ABI which states:
> >
> >
> > ```
> > C.9     If the argument is an Integral or Pointer Type, the size of the
> argument is
> > less than or equal to 8 bytes and the NGRN is less than 8, the
> > argument is copied to the least significant bits in x[NGRN]. The NGRN is
> incremented by one.
> > The argument has now been allocated.
> >
> > C.16    If the size of the argument is less than 8 bytes then the size of 
> > the
> > argument is set to 8 bytes. The effect is as if the argument was
> > copied to the least significant bits of a 64-bit register and the
> > remaining bits filled with unspecified values ```
> >
> > That is, the bits in the registers are unspecified and callees cannot
> > assume any particular status.
> >
> > This means that we can avoid the promotion and still get correct code
> > as the language level promotion rules require values to be extended
> > when the bits are significant.
> >
> > So if we are .e.g OR-ing two 8-bit values no extend is needed as the
> > top bits are irrelevant.  If we are doing e.g. addition, then the top
> > bits *might* be relevant depending on the result type.  But the middle
> > end will always contain the appropriate extend in those cases.
> >
> > The mid-end also has optimizations around this assumption and the
> > AArch64 port actively undoes them.
> >
> > So for instance
> >
> > uint16_t fd (uint8_t xr){
> >     return xr + 1;
> > }
> >
> > uint8_t fd2 (uint8_t xr){
> >     return xr + 1;
> > }
> >
> > should produce
> >
> > fd:                                     // @fd
> >         and     w8, w0, #0xff
> >         add     w0, w8, #1
> >         ret
> > fd2:                                    // @fd2
> >         add     w0, w0, #1
> >         ret
> >
> > like clang does instead of
> >
> > fd:
> >         and     w0, w0, 255
> >         add     w0, w0, 1
> >         ret
> > fd2:
> >         and     w0, w0, 255
> >         add     w0, w0, 1
> >         ret
> >
> > like we do now.  Removing this forced expansion maintains correctness
> > but fixes issues with various codegen defects.  It also brings us inline 
> > with
> clang.
> >
> > Note that C, C++ and Fortran etc all correctly specify what should
> > happen w.r.t extends and e.g. array indexing, pointer arith etc so we
> > never get incorrect code.
> >
> > There is however a second reason for doing this promotion: RTL efficiency.
> > The promotion stops us from having to promote the values to SI to be
> > able to use them in instructions and then truncating again afterwards.
> >
> > To get both the efficiency and the simpler RTL we can instead promote
> > to a paradoxical subreg.  This patch implements the hook for AArch64
> > and adds an explicit opt-out for values that feed into comparisons.  This is
> done because:
> >
> > 1. our comparisons patterns already allow us to absorb the zero extend
> > 2. The extension allows us to use cbz/cbnz/tbz etc.  In some cases
> > such as
> >
> > int foo (char a, char b)
> > {
> >    if (a)
> >      if (b)
> >        bar1 ();
> >      else
> >        ...
> >     else
> >      if (b)
> >        bar2 ();
> >      else
> >        ...
> > }
> >
> > by zero extending the value we can avoid having to repeatedly test the
> > value before a branch.  Allowing the zero extend also allows our
> > existing `ands` patterns to work as expected.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > I have to commit this and the last patch together but ease of review I
> > have split them up here. However 209 missed optimization xfails are
> > fixed.
> >
> > No performance difference on SPECCPU 2017 but no failures.
> >
> > Ok for master?
> 
> Did this patch ever get approved? It is a nice improvement that would be nice
> to get into GCC 13 before the close of stage 1.

No, It was requested I make a standalone pass that introduces a new kind of 
extension
in the mid-end.  Unfortunately due to constrains on how much time I can 
dedicate to
that this year I've had to drop it for GCC 13.

I'll try to pick it up again during GCC 14.

Regards,
Tamar

> 
> Thanks,
> Andrew
> 
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >         * config/aarch64/aarch64.cc
> (aarch64_promote_function_args_subreg_p):
> >         (TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P): New.
> >         * config/aarch64/aarch64.h (PROMOTE_MODE): Expand doc.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/aarch64/apc-subreg.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/aarch64/aarch64.h
> > b/gcc/config/aarch64/aarch64.h index
> >
> efa46ac0b8799b5849b609d591186e26e5cb37ff..cc74a816fcc6458aa065246a30
> a4
> > d2184692ad74 100644
> > --- a/gcc/config/aarch64/aarch64.h
> > +++ b/gcc/config/aarch64/aarch64.h
> > @@ -34,7 +34,8 @@
> >
> >  #define REGISTER_TARGET_PRAGMAS() aarch64_register_pragmas ()
> >
> > -/* Target machine storage layout.  */
> > +/* Target machine storage layout.  See also
> > +   TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P.  */
> >
> >  #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE)    \
> >    if (GET_MODE_CLASS (MODE) == MODE_INT                \
> > diff --git a/gcc/config/aarch64/aarch64.cc
> > b/gcc/config/aarch64/aarch64.cc index
> >
> 2f559600cff55af9d468e8d0810545583cc986f5..252d6c2af72afc1dfee1a86644a5
> > 753784b41f59 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -3736,6 +3736,57 @@ aarch64_array_mode_supported_p
> (machine_mode mode,
> >    return false;
> >  }
> >
> > +/* Implement target hook
> TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P to complement
> > +   PROMOTE_MODE.  If any argument promotion was done, do them as
> > +subregs.  */ static bool aarch64_promote_function_args_subreg_p
> > +(machine_mode mode,
> > +                                       machine_mode promoted_mode,
> > +                                       int /* unsignedp */, tree
> > +parm) {
> > +  bool candidate_p = GET_MODE_CLASS (mode) == MODE_INT
> > +                    && GET_MODE_CLASS (promoted_mode) == MODE_INT
> > +                    && known_lt (GET_MODE_SIZE (mode), 4)
> > +                    && promoted_mode == SImode;
> > +
> > +  if (!candidate_p)
> > +    return false;
> > +
> > +  if (!parm || !is_gimple_reg (parm))
> > +    return true;
> > +
> > +  tree var = parm;
> > +  if (!VAR_P (var))
> > +    {
> > +      if (TREE_CODE (parm) == SSA_NAME
> > +          && !(var = SSA_NAME_VAR (var)))
> > +       return true;
> > +      else if (TREE_CODE (parm) != PARM_DECL)
> > +       return true;
> > +    }
> > +
> > +  /* If the variable is used inside a comparison which sets CC then we
> should
> > +     still promote using an extend.  By doing this we make it easier to use
> > +     cbz/cbnz but also repeatedly having to test the value in certain
> > +     circumstances like nested if values that test the same value with 
> > calls
> > +     in between. */
> > +  tree ssa_var = ssa_default_def (cfun, var);  if (!ssa_var)
> > +    return true;
> > +
> > +  const ssa_use_operand_t *const head =
> &(SSA_NAME_IMM_USE_NODE
> > + (ssa_var));  const ssa_use_operand_t *ptr;
> > +
> > +  for (ptr = head->next; ptr != head; ptr = ptr->next)
> > +    if (USE_STMT(ptr) && is_gimple_assign (USE_STMT (ptr)))
> > +      {
> > +       tree_code code = gimple_assign_rhs_code (USE_STMT(ptr));
> > +       if (TREE_CODE_CLASS (code) == tcc_comparison)
> > +         return false;
> > +      }
> > +
> > +  return true;
> > +}
> > +
> >  /* MODE is some form of SVE vector mode.  For data modes, return the
> number
> >     of vector register bits that each element of MODE occupies, such as 64
> >     for both VNx2DImode and VNx2SImode (where each 32-bit value is
> > stored @@ -27490,6 +27541,10 @@
> > aarch64_libgcc_floating_mode_supported_p
> >  #undef TARGET_ARRAY_MODE_SUPPORTED_P
> >  #define TARGET_ARRAY_MODE_SUPPORTED_P
> aarch64_array_mode_supported_p
> >
> > +#undef TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P
> > +#define TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P \
> > +  aarch64_promote_function_args_subreg_p
> > +
> >  #undef TARGET_VECTORIZE_CREATE_COSTS
> >  #define TARGET_VECTORIZE_CREATE_COSTS
> aarch64_vectorize_create_costs
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/apc-subreg.c
> > b/gcc/testsuite/gcc.target/aarch64/apc-subreg.c
> > new file mode 100644
> > index
> >
> 0000000000000000000000000000000000000000..2d7563a11ce11fa677f7ad4bf2
> a0
> > 90e6a136e4d9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/apc-subreg.c
> > @@ -0,0 +1,103 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-O" } */
> > +/* { dg-final { check-function-bodies "**" "" "" } } */
> > +
> > +#include <stdint.h>
> > +
> > +/*
> > +** f0:
> > +**     mvn     w0, w0
> > +**     ret
> > +*/
> > +uint8_t f0 (uint8_t xr){
> > +    return (uint8_t) (0xff - xr);
> > +}
> > +
> > +/*
> > +** f1:
> > +**     mvn     w0, w0
> > +**     ret
> > +*/
> > +int8_t f1 (int8_t xr){
> > +    return (int8_t) (0xff - xr);
> > +}
> > +
> > +/*
> > +** f2:
> > +**     mvn     w0, w0
> > +**     ret
> > +*/
> > +uint16_t f2 (uint16_t xr){
> > +    return (uint16_t) (0xffFF - xr);
> > +}
> > +
> > +/*
> > +** f3:
> > +**     mvn     w0, w0
> > +**     ret
> > +*/
> > +uint32_t f3 (uint32_t xr){
> > +    return (uint32_t) (0xffFFffff - xr); }
> > +
> > +/*
> > +** f4:
> > +**     mvn     x0, x0
> > +**     ret
> > +*/
> > +uint64_t f4 (uint64_t xr){
> > +    return (uint64_t) (0xffFFffffffffffff - xr); }
> > +
> > +/*
> > +** f5:
> > +**     mvn     w0, w0
> > +**     sub     w0, w0, w1
> > +**     ret
> > +*/
> > +uint8_t f5 (uint8_t xr, uint8_t xc){
> > +    return (uint8_t) (0xff - xr - xc); }
> > +
> > +/*
> > +** f6:
> > +**     mvn     w0, w0
> > +**     and     w0, w0, 255
> > +**     and     w1, w1, 255
> > +**     mul     w0, w0, w1
> > +**     ret
> > +*/
> > +uint16_t f6 (uint8_t xr, uint8_t xc){
> > +    return ((uint8_t) (0xff - xr)) * xc; }
> > +
> > +/*
> > +** f7:
> > +**     and     w0, w0, 255
> > +**     and     w1, w1, 255
> > +**     mul     w0, w0, w1
> > +**     ret
> > +*/
> > +uint16_t f7 (uint8_t xr, uint8_t xc){
> > +    return xr * xc;
> > +}
> > +
> > +/*
> > +** f8:
> > +**     mul     w0, w0, w1
> > +**     and     w0, w0, 255
> > +**     ret
> > +*/
> > +uint16_t f8 (uint8_t xr, uint8_t xc){
> > +    return (uint8_t)(xr * xc);
> > +}
> > +
> > +/*
> > +** f9:
> > +**     and     w0, w0, 255
> > +**     add     w0, w0, w1
> > +**     ret
> > +*/
> > +uint16_t f9 (uint8_t xr, uint16_t xc){
> > +    return xr + xc;
> > +}
> >
> >
> >
> >
> > --

Reply via email to