Hi Suwa-san, On Thu, Jan 26, 2023 at 7:17 PM Takayuki 'January June' Suwa <jjsuwa_sys3...@yahoo.co.jp> wrote: > > In the case of the CALL0 ABI, values that must be retained before and > after function calls are placed in the callee-saved registers (A12 > through A15) and referenced later. However, it is often the case that > the save and the reference are each only once and a simple register- > register move (with two exceptions; i. the register saved to/restored > from is the stack pointer, ii. the function needs an additional stack > pointer adjustment to grow the stack). > > e.g. in the following example, if there are no other occurrences of > register A14: > > ;; before > ; prologue { > ... > s32i.n a14, sp, 16 > ... ;; no frame pointer needed > ;; no additional stack growth > ; } prologue > ... > mov.n a14, a6 ;; A6 is not SP > ... > call0 foo > ... > mov.n a8, a14 ;; A8 is not SP > ... > ; epilogue { > ... > l32i.n a14, sp, 16 > ... > ; } epilogue > > It can be possible like this: > > ;; after > ; prologue { > ... > (no save needed) > ... > ; } prologue > ... > s32i.n a6, sp, 16 ;; replaced with A14's slot > ... > call0 foo > ... > l32i.n a8, sp, 16 ;; through SP > ... > ; epilogue { > ... > (no restoration needed) > ... > ; } epilogue > > This patch adds the abovementioned logic to the function prologue/epilogue > RTL expander code. > > gcc/ChangeLog: > > * config/xtensa/xtensa.cc (machine_function): Add new member > 'eliminated_callee_saved_bmp'. > (xtensa_can_eliminate_callee_saved_reg_p): New function to > determine whether the register can be eliminated or not. > (xtensa_expand_prologue): Add invoking the above function and > elimination the use of callee-saved register by using its stack > slot through the stack pointer (or the frame pointer if needed) > directly. > (xtensa_expand_prologue): Modify to not emit register restoration > insn from its stack slot if the register is already eliminated. > > gcc/testsuite/ChangeLog: > > * gcc.target/xtensa/elim_callee_saved.c: New. > --- > gcc/config/xtensa/xtensa.cc | 132 ++++++++++++++---- > .../gcc.target/xtensa/elim_callee_saved.c | 38 +++++ > 2 files changed, 145 insertions(+), 25 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c
This version passes regression tests, but I still have a couple questions. > diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc > index 3e2e22d4cbe..ff59c933d4d 100644 > --- a/gcc/config/xtensa/xtensa.cc > +++ b/gcc/config/xtensa/xtensa.cc > @@ -105,6 +105,7 @@ struct GTY(()) machine_function > bool epilogue_done; > bool inhibit_logues_a1_adjusts; > rtx last_logues_a9_content; > + HOST_WIDE_INT eliminated_callee_saved_bmp; > }; > > static void xtensa_option_override (void); > @@ -3343,6 +3344,66 @@ xtensa_emit_adjust_stack_ptr (HOST_WIDE_INT offset, > int flags) > cfun->machine->last_logues_a9_content = GEN_INT (offset); > } > > +static bool > +xtensa_can_eliminate_callee_saved_reg_p (unsigned int regno, > + rtx_insn **p_insnS, > + rtx_insn **p_insnR) > +{ > + df_ref ref; > + rtx_insn *insn, *insnS = NULL, *insnR = NULL; > + rtx pattern; > + > + if (!optimize || !df || call_used_or_fixed_reg_p (regno)) > + return false; > + > + for (ref = DF_REG_DEF_CHAIN (regno); > + ref; ref = DF_REF_NEXT_REG (ref)) > + if (DF_REF_CLASS (ref) != DF_REF_REGULAR > + || DEBUG_INSN_P (insn = DF_REF_INSN (ref))) > + continue; > + else if (GET_CODE (pattern = PATTERN (insn)) == SET > + && REG_P (SET_DEST (pattern)) > + && REGNO (SET_DEST (pattern)) == regno > + && REG_NREGS (SET_DEST (pattern)) == 1 > + && REG_P (SET_SRC (pattern)) > + && REGNO (SET_SRC (pattern)) != A1_REG) Do I understand correctly that the check for A1 here and below is for the case when regno is a hard frame pointer and the function needs the frame pointer? If so, wouldn't it be better to check for it explicitly in the beginning? > + { > + if (insnS) > + return false; > + insnS = insn; > + continue; > + } > + else > + return false; > + > + for (ref = DF_REG_USE_CHAIN (regno); > + ref; ref = DF_REF_NEXT_REG (ref)) > + if (DF_REF_CLASS (ref) != DF_REF_REGULAR > + || DEBUG_INSN_P (insn = DF_REF_INSN (ref))) > + continue; > + else if (GET_CODE (pattern = PATTERN (insn)) == SET > + && REG_P (SET_SRC (pattern)) > + && REGNO (SET_SRC (pattern)) == regno > + && REG_NREGS (SET_SRC (pattern)) == 1 > + && REG_P (SET_DEST (pattern)) > + && REGNO (SET_DEST (pattern)) != A1_REG) > + { > + if (insnR) > + return false; > + insnR = insn; > + continue; > + } > + else > + return false; > + > + if (!insnS || !insnR) > + return false; > + > + *p_insnS = insnS, *p_insnR = insnR; > + > + return true; > +} [...] > diff --git a/gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c > b/gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c > new file mode 100644 > index 00000000000..cd3d6b9f249 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c > @@ -0,0 +1,38 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mabi=call0" } */ > + > +extern void foo(void); > + > +/* eliminated one register (the reservoir of variable 'a') by its stack slot > through the stack pointer. */ > +int test0(int a) { > + int array[252]; /* the maximum bound of non-large stack. */ > + foo(); > + asm volatile("" : : "m"(array)); > + return a; > +} > + > +/* cannot eliminate if large stack is needed, because the offset from TOS > cannot fit into single L32I/S32I instruction. */ > +int test1(int a) { > + int array[10000]; /* requires large stack. */ > + foo(); > + asm volatile("" : : "m"(array)); > + return a; > +} > + > +/* register A15 is the reservoir of the stack pointer and cannot be > eliminated if the frame pointer is needed. > + other registers still can be, but through the frame pointer rather the > stack pointer. */ > +int test2(int a) { > + int* p = __builtin_alloca(16); > + foo(); > + asm volatile("" : : "r"(p)); > + return a; > +} > + > +/* in -O0 the composite hard registers may still remain unsplitted at > pro_and_epilogue and must be excluded. */ > +extern double bar(void); > +int __attribute__((optimize(0))) test3(int a) { > + return bar() + a; > +} > + > +/* { dg-final { scan-assembler-times "mov\t|mov.n\t" 21 } } */ This test looks quite fragile as the number of movs would vary when the testsuite is run with additional options. > +/* { dg-final { scan-assembler-times "a15, 8" 2 } } */ > -- > 2.30.2 -- Thanks. -- Max