Renlin Li <renlin...@foss.arm.com> writes:
> Hi all,
>
> In aarch64 backend, ip0/ip1 register will be used in the prologue/epilogue as
> temporary register.
>
> When the compiler is performing sibcall optimization. It has the chance to use
> ip0/ip1 register for indirect function call to hold the address. However, 
> those two register might
> be clobbered by the epilogue code which makes the last sibcall instruction
> invalid.
>
> The following is an extreme example:
> When built with -O2 -ffixed-x0 -ffixed-x1 -ffixed-x2 -ffixed-x3 -ffixed-x4 
> -ffixed-x5 -ffixed-x6 -ffixed-x7
> -ffixed-x8 -ffixed-x9 -ffixed-x10 -ffixed-x11 -ffixed-x12 -ffixed-x13 
> -ffixed-x14 -ffixed-x15 -ffixed-x17 -ffixed-x18
>
> void (*f)();
> int xx;
>
> void tailcall (int i)
>
> {
>     int arr[5000];
>     xx = arr[i];
>     f();
> }
>
>
> tailcall:
>       mov     x16, 20016
>       sub     sp, sp, x16
>       adrp    x16, .LANCHOR0
>       stp     x19, x30, [sp]
>       add     x19, sp, 16
>       ldr     s0, [x19, w0, sxtw 2]
>       ldp     x19, x30, [sp]
>       str     s0, [x16, #:lo12:.LANCHOR0]
>       mov     x16, 20016
>       add     sp, sp, x16
>       br      x16   // oops
>
>
> As we can see, x16 is used in the indirect sibcall instruction. It is used as
> a temporary in the epilogue code as well. The register allocation is invalid.
>
> With the change, the register allocator is only allowed to use r0-r15, r18 for
> indirect sibcall instruction.
>
> For this particular case above, the compiler will ICE as there is not register
> could be used for this sibcall instruction.
> And I think it is better to fail instead of wrong code-generation.
>
> test.c:10:1: error: unable to generate reloads for:
>   }
>   ^
> (call_insn/j 16 12 17 2 (parallel [
>              (call (mem:DI (reg/f:DI 84 [ f ]) [0 *f.0_2 S8 A8])
>                  (const_int 0 [0]))
>              (return)
>          ]) "test.c":9 42 {*sibcall_insn}
>       (expr_list:REG_DEAD (reg/f:DI 84 [ f ])
>          (expr_list:REG_CALL_DECL (nil)
>              (nil)))
>      (expr_list (clobber (reg:DI 17 x17))
>          (expr_list (clobber (reg:DI 16 x16))
>              (nil))))
>
> aarch64-none-elf test without regressions. Okay to commit?
> The same issue affects gcc-6, gcc-7 as well. Backport are needed for those 
> branches.

The patch looks good to me FWIW.  How about adding something like
the following testcase?

------------------------------------
/* { dg-do run } */
/* { dg-options "-O2" } */

typedef void (*fun) (void);

void __attribute__ ((noipa))
f (fun x1)
{
  register fun x2 asm ("x16");
  int arr[5000];
  int *volatile ptr = arr;
  asm ("mov %0, %1" : "=r" (x2) : "r" (x1));
  x2 ();
}

void g (void) {}

int
main (void)
{
  f (g);
}
------------------------------------

> [...]
> diff --git a/gcc/config/aarch64/constraints.md 
> b/gcc/config/aarch64/constraints.md
> index 
> af4143ef756464afac29d17f124b436520f90451..c3791aa89562a5d5542098d2f7951afc57901150
>  100644
> --- a/gcc/config/aarch64/constraints.md
> +++ b/gcc/config/aarch64/constraints.md
> @@ -21,8 +21,8 @@
>  (define_register_constraint "k" "STACK_REG"
>    "@internal The stack register.")
>  
> -(define_register_constraint "Ucs" "CALLER_SAVE_REGS"
> -  "@internal The caller save registers.")
> +(define_register_constraint "Ucs" "TAILCALL_ADDR_REGS"
> +  "@internal The indirect tail call address registers")
>  
>  (define_register_constraint "w" "FP_REGS"
>    "Floating point and SIMD vector registers.")

Maybe "@internal Registers suitable for an indirect tail call"?
Unlike the caller-save registers, these aren't a predefined set.

Thanks,
Richard

Reply via email to