This test has been failing since r15-1619-g3b9b8d6cfdf593, which made IRA prefer a call-clobbered register over a call-preserved register for mem1 (the second load). In this particular case, that just forces the variable p3 to be allocated to a call-preserved register instead, leading to an extra predicate move from p3 to that register.
However, it was really pot luck that this worked before. Each argument is used exactly once, so there isn't an obvious colouring order. And mem0 and mem1 are passed by indirect reference, so they are not REG_EQUIV to a stack slot in the way that some memory arguments are. IIRC, the test was the result of some experimentation, and so I think the best fix is to rework it to try to make it less sensitive to RA decisions. This patch does that by enabling scheduling for the function and using both memory arguments in the same instruction. This gets rid of the distracting prologue and epilogue code and restricts the test to the PCS parts. Tested on aarch64-linux-gnu. I'll leave a day or so for comments before pushing. Richard gcc/testsuite/ PR testsuite/116604 * gcc.target/aarch64/sve/pcs/args_1.c (callee_pred): Enable scheduling and use both memory arguments in the same instruction. Expect no prologue and epilogue code. --- .../gcc.target/aarch64/sve/pcs/args_1.c | 31 ++++++++++--------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_1.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_1.c index 6deca329599..b020a043523 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_1.c @@ -1,31 +1,32 @@ /* { dg-do compile } */ -/* { dg-options "-O -g" } */ +/* { dg-options "-O -mtune=generic -g" } */ /* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ #include <arm_sve.h> /* ** callee_pred: -** addvl sp, sp, #-1 -** str p[0-9]+, \[sp\] -** str p[0-9]+, \[sp, #1, mul vl\] -** ldr (p[0-9]+), \[x0\] -** ldr (p[0-9]+), \[x1\] -** brkpa (p[0-7])\.b, p0/z, p1\.b, p2\.b -** brkpb (p[0-7])\.b, \3/z, p3\.b, \1\.b -** brka p0\.b, \4/z, \2\.b -** ldr p[0-9]+, \[sp\] -** ldr p[0-9]+, \[sp, #1, mul vl\] -** addvl sp, sp, #1 +** brkpa (p[0-3])\.b, p0/z, p1\.b, p2\.b +** ( +** ldr (p[0-3]), \[x0\] +** ldr (p[0-3]), \[x1\] +** brkpb (p[0-3])\.b, \1/z, \2\.b, \3\.b +** brka p0\.b, \4/z, p3\.b +** | +** ldr (p[0-3]), \[x1\] +** ldr (p[0-3]), \[x0\] +** brkpb (p[0-3])\.b, \1/z, \6\.b, \5\.b +** brka p0\.b, \7/z, p3\.b +** ) ** ret */ -__SVBool_t __attribute__((noipa)) +__SVBool_t __attribute__((noipa, optimize("schedule-insns"))) callee_pred (__SVBool_t p0, __SVBool_t p1, __SVBool_t p2, __SVBool_t p3, __SVBool_t mem0, __SVBool_t mem1) { p0 = svbrkpa_z (p0, p1, p2); - p0 = svbrkpb_z (p0, p3, mem0); - return svbrka_z (p0, mem1); + p0 = svbrkpb_z (p0, mem0, mem1); + return svbrka_z (p0, p3); } /* -- 2.25.1