On Fri, May 13, 2022 at 10:14 AM Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Hi All, > > The PROMOTE_MODE always promotes 8 and 16-bit parameters to 32-bits. > This promotion is not required for the ABI which states: > > > ``` > C.9 If the argument is an Integral or Pointer Type, the size of the > argument is > less than or equal to 8 bytes and the NGRN is less than 8, the argument is > copied to the least significant bits in x[NGRN]. The NGRN is incremented by > one. > The argument has now been allocated. > > C.16 If the size of the argument is less than 8 bytes then the size of the > argument is set to 8 bytes. The effect is as if the argument was copied to the > least significant bits of a 64-bit register and the remaining bits filled with > unspecified values > ``` > > That is, the bits in the registers are unspecified and callees cannot assume > any particular status. > > This means that we can avoid the promotion and still get correct code as the > language level promotion rules require values to be extended when the bits are > significant. > > So if we are .e.g OR-ing two 8-bit values no extend is needed as the top bits > are irrelevant. If we are doing e.g. addition, then the top bits *might* be > relevant depending on the result type. But the middle end will always > contain the appropriate extend in those cases. > > The mid-end also has optimizations around this assumption and the AArch64 port > actively undoes them. > > So for instance > > uint16_t fd (uint8_t xr){ > return xr + 1; > } > > uint8_t fd2 (uint8_t xr){ > return xr + 1; > } > > should produce > > fd: // @fd > and w8, w0, #0xff > add w0, w8, #1 > ret > fd2: // @fd2 > add w0, w0, #1 > ret > > like clang does instead of > > fd: > and w0, w0, 255 > add w0, w0, 1 > ret > fd2: > and w0, w0, 255 > add w0, w0, 1 > ret > > like we do now. Removing this forced expansion maintains correctness but > fixes > issues with various codegen defects. It also brings us inline with clang. > > Note that C, C++ and Fortran etc all correctly specify what should happen > w.r.t > extends and e.g. array indexing, pointer arith etc so we never get incorrect > code. > > There is however a second reason for doing this promotion: RTL efficiency. > The promotion stops us from having to promote the values to SI to be able to > use them in instructions and then truncating again afterwards. > > To get both the efficiency and the simpler RTL we can instead promote to a > paradoxical subreg. This patch implements the hook for AArch64 and adds an > explicit opt-out for values that feed into comparisons. This is done because: > > 1. our comparisons patterns already allow us to absorb the zero extend > 2. The extension allows us to use cbz/cbnz/tbz etc. In some cases such as > > int foo (char a, char b) > { > if (a) > if (b) > bar1 (); > else > ... > else > if (b) > bar2 (); > else > ... > } > > by zero extending the value we can avoid having to repeatedly test the value > before a branch. Allowing the zero extend also allows our existing `ands` > patterns to work as expected. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > I have to commit this and the last patch together but ease of review > I have split them up here. However 209 missed optimization xfails are > fixed. > > No performance difference on SPECCPU 2017 but no failures. > > Ok for master?
Did this patch ever get approved? It is a nice improvement that would be nice to get into GCC 13 before the close of stage 1. Thanks, Andrew > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64.cc (aarch64_promote_function_args_subreg_p): > (TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P): New. > * config/aarch64/aarch64.h (PROMOTE_MODE): Expand doc. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/apc-subreg.c: New test. > > --- inline copy of patch -- > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > index > efa46ac0b8799b5849b609d591186e26e5cb37ff..cc74a816fcc6458aa065246a30a4d2184692ad74 > 100644 > --- a/gcc/config/aarch64/aarch64.h > +++ b/gcc/config/aarch64/aarch64.h > @@ -34,7 +34,8 @@ > > #define REGISTER_TARGET_PRAGMAS() aarch64_register_pragmas () > > -/* Target machine storage layout. */ > +/* Target machine storage layout. See also > + TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P. */ > > #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE) \ > if (GET_MODE_CLASS (MODE) == MODE_INT \ > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index > 2f559600cff55af9d468e8d0810545583cc986f5..252d6c2af72afc1dfee1a86644a5753784b41f59 > 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -3736,6 +3736,57 @@ aarch64_array_mode_supported_p (machine_mode mode, > return false; > } > > +/* Implement target hook TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P to complement > + PROMOTE_MODE. If any argument promotion was done, do them as subregs. */ > +static bool > +aarch64_promote_function_args_subreg_p (machine_mode mode, > + machine_mode promoted_mode, > + int /* unsignedp */, tree parm) > +{ > + bool candidate_p = GET_MODE_CLASS (mode) == MODE_INT > + && GET_MODE_CLASS (promoted_mode) == MODE_INT > + && known_lt (GET_MODE_SIZE (mode), 4) > + && promoted_mode == SImode; > + > + if (!candidate_p) > + return false; > + > + if (!parm || !is_gimple_reg (parm)) > + return true; > + > + tree var = parm; > + if (!VAR_P (var)) > + { > + if (TREE_CODE (parm) == SSA_NAME > + && !(var = SSA_NAME_VAR (var))) > + return true; > + else if (TREE_CODE (parm) != PARM_DECL) > + return true; > + } > + > + /* If the variable is used inside a comparison which sets CC then we should > + still promote using an extend. By doing this we make it easier to use > + cbz/cbnz but also repeatedly having to test the value in certain > + circumstances like nested if values that test the same value with calls > + in between. */ > + tree ssa_var = ssa_default_def (cfun, var); > + if (!ssa_var) > + return true; > + > + const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (ssa_var)); > + const ssa_use_operand_t *ptr; > + > + for (ptr = head->next; ptr != head; ptr = ptr->next) > + if (USE_STMT(ptr) && is_gimple_assign (USE_STMT (ptr))) > + { > + tree_code code = gimple_assign_rhs_code (USE_STMT(ptr)); > + if (TREE_CODE_CLASS (code) == tcc_comparison) > + return false; > + } > + > + return true; > +} > + > /* MODE is some form of SVE vector mode. For data modes, return the number > of vector register bits that each element of MODE occupies, such as 64 > for both VNx2DImode and VNx2SImode (where each 32-bit value is stored > @@ -27490,6 +27541,10 @@ aarch64_libgcc_floating_mode_supported_p > #undef TARGET_ARRAY_MODE_SUPPORTED_P > #define TARGET_ARRAY_MODE_SUPPORTED_P aarch64_array_mode_supported_p > > +#undef TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P > +#define TARGET_PROMOTE_FUNCTION_ARGS_SUBREG_P \ > + aarch64_promote_function_args_subreg_p > + > #undef TARGET_VECTORIZE_CREATE_COSTS > #define TARGET_VECTORIZE_CREATE_COSTS aarch64_vectorize_create_costs > > diff --git a/gcc/testsuite/gcc.target/aarch64/apc-subreg.c > b/gcc/testsuite/gcc.target/aarch64/apc-subreg.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..2d7563a11ce11fa677f7ad4bf2a090e6a136e4d9 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/apc-subreg.c > @@ -0,0 +1,103 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O" } */ > +/* { dg-final { check-function-bodies "**" "" "" } } */ > + > +#include <stdint.h> > + > +/* > +** f0: > +** mvn w0, w0 > +** ret > +*/ > +uint8_t f0 (uint8_t xr){ > + return (uint8_t) (0xff - xr); > +} > + > +/* > +** f1: > +** mvn w0, w0 > +** ret > +*/ > +int8_t f1 (int8_t xr){ > + return (int8_t) (0xff - xr); > +} > + > +/* > +** f2: > +** mvn w0, w0 > +** ret > +*/ > +uint16_t f2 (uint16_t xr){ > + return (uint16_t) (0xffFF - xr); > +} > + > +/* > +** f3: > +** mvn w0, w0 > +** ret > +*/ > +uint32_t f3 (uint32_t xr){ > + return (uint32_t) (0xffFFffff - xr); > +} > + > +/* > +** f4: > +** mvn x0, x0 > +** ret > +*/ > +uint64_t f4 (uint64_t xr){ > + return (uint64_t) (0xffFFffffffffffff - xr); > +} > + > +/* > +** f5: > +** mvn w0, w0 > +** sub w0, w0, w1 > +** ret > +*/ > +uint8_t f5 (uint8_t xr, uint8_t xc){ > + return (uint8_t) (0xff - xr - xc); > +} > + > +/* > +** f6: > +** mvn w0, w0 > +** and w0, w0, 255 > +** and w1, w1, 255 > +** mul w0, w0, w1 > +** ret > +*/ > +uint16_t f6 (uint8_t xr, uint8_t xc){ > + return ((uint8_t) (0xff - xr)) * xc; > +} > + > +/* > +** f7: > +** and w0, w0, 255 > +** and w1, w1, 255 > +** mul w0, w0, w1 > +** ret > +*/ > +uint16_t f7 (uint8_t xr, uint8_t xc){ > + return xr * xc; > +} > + > +/* > +** f8: > +** mul w0, w0, w1 > +** and w0, w0, 255 > +** ret > +*/ > +uint16_t f8 (uint8_t xr, uint8_t xc){ > + return (uint8_t)(xr * xc); > +} > + > +/* > +** f9: > +** and w0, w0, 255 > +** add w0, w0, w1 > +** ret > +*/ > +uint16_t f9 (uint8_t xr, uint16_t xc){ > + return xr + xc; > +} > > > > > --