Renlin Li <renlin...@foss.arm.com> writes: > Hi all, > > In aarch64 backend, ip0/ip1 register will be used in the prologue/epilogue as > temporary register. > > When the compiler is performing sibcall optimization. It has the chance to use > ip0/ip1 register for indirect function call to hold the address. However, > those two register might > be clobbered by the epilogue code which makes the last sibcall instruction > invalid. > > The following is an extreme example: > When built with -O2 -ffixed-x0 -ffixed-x1 -ffixed-x2 -ffixed-x3 -ffixed-x4 > -ffixed-x5 -ffixed-x6 -ffixed-x7 > -ffixed-x8 -ffixed-x9 -ffixed-x10 -ffixed-x11 -ffixed-x12 -ffixed-x13 > -ffixed-x14 -ffixed-x15 -ffixed-x17 -ffixed-x18 > > void (*f)(); > int xx; > > void tailcall (int i) > > { > int arr[5000]; > xx = arr[i]; > f(); > } > > > tailcall: > mov x16, 20016 > sub sp, sp, x16 > adrp x16, .LANCHOR0 > stp x19, x30, [sp] > add x19, sp, 16 > ldr s0, [x19, w0, sxtw 2] > ldp x19, x30, [sp] > str s0, [x16, #:lo12:.LANCHOR0] > mov x16, 20016 > add sp, sp, x16 > br x16 // oops > > > As we can see, x16 is used in the indirect sibcall instruction. It is used as > a temporary in the epilogue code as well. The register allocation is invalid. > > With the change, the register allocator is only allowed to use r0-r15, r18 for > indirect sibcall instruction. > > For this particular case above, the compiler will ICE as there is not register > could be used for this sibcall instruction. > And I think it is better to fail instead of wrong code-generation. > > test.c:10:1: error: unable to generate reloads for: > } > ^ > (call_insn/j 16 12 17 2 (parallel [ > (call (mem:DI (reg/f:DI 84 [ f ]) [0 *f.0_2 S8 A8]) > (const_int 0 [0])) > (return) > ]) "test.c":9 42 {*sibcall_insn} > (expr_list:REG_DEAD (reg/f:DI 84 [ f ]) > (expr_list:REG_CALL_DECL (nil) > (nil))) > (expr_list (clobber (reg:DI 17 x17)) > (expr_list (clobber (reg:DI 16 x16)) > (nil)))) > > aarch64-none-elf test without regressions. Okay to commit? > The same issue affects gcc-6, gcc-7 as well. Backport are needed for those > branches.
The patch looks good to me FWIW. How about adding something like the following testcase? ------------------------------------ /* { dg-do run } */ /* { dg-options "-O2" } */ typedef void (*fun) (void); void __attribute__ ((noipa)) f (fun x1) { register fun x2 asm ("x16"); int arr[5000]; int *volatile ptr = arr; asm ("mov %0, %1" : "=r" (x2) : "r" (x1)); x2 (); } void g (void) {} int main (void) { f (g); } ------------------------------------ > [...] > diff --git a/gcc/config/aarch64/constraints.md > b/gcc/config/aarch64/constraints.md > index > af4143ef756464afac29d17f124b436520f90451..c3791aa89562a5d5542098d2f7951afc57901150 > 100644 > --- a/gcc/config/aarch64/constraints.md > +++ b/gcc/config/aarch64/constraints.md > @@ -21,8 +21,8 @@ > (define_register_constraint "k" "STACK_REG" > "@internal The stack register.") > > -(define_register_constraint "Ucs" "CALLER_SAVE_REGS" > - "@internal The caller save registers.") > +(define_register_constraint "Ucs" "TAILCALL_ADDR_REGS" > + "@internal The indirect tail call address registers") > > (define_register_constraint "w" "FP_REGS" > "Floating point and SIMD vector registers.") Maybe "@internal Registers suitable for an indirect tail call"? Unlike the caller-save registers, these aren't a predefined set. Thanks, Richard