https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65540
--- Comment #10 from Jan Hubicka <hubicka at gcc dot gnu.org> --- OK, this patch extends the calls.c hack: Index: calls.c =================================================================== --- calls.c (revision 221805) +++ calls.c (working copy) @@ -1321,6 +1321,15 @@ initialize_argument_information (int num && TREE_CODE (base) != SSA_NAME && (!DECL_P (base) || MEM_P (DECL_RTL (base))))) { + /* We may have turned the parameter value into an SSA name. + Go back to the original parameter so we can take the + address. */ + if (TREE_CODE (args[i].tree_value) == SSA_NAME) + { + gcc_assert (SSA_NAME_IS_DEFAULT_DEF (args[i].tree_value)); + args[i].tree_value = SSA_NAME_VAR (args[i].tree_value); + gcc_assert (TREE_CODE (args[i].tree_value) == PARM_DECL); + } /* Argument setup code may have copied the value to register. We revert that optimization now because the tail call code must use the original location. */ We fail to produce tail call, because the logic checking return values triggers. At least save the extra copy of parameter: func1: subq $56, %rsp .seh_stackalloc 56 .seh_endprologue movq %rcx, %r8 leaq 32(%rsp), %rcx call func2 fldt 32(%rsp) movq %r8, %rax fstpt (%r8) addq $56, %rsp ret .seh_endproc .section .text.unlikely,"x" I will regtest&bootstrap this on ppc but it would be nice if someone could verify the generated code works (it seems RDX is holding the parameter pointer and RCX the temporary slot). Making this tail call seem to need more work, because the link to the actual return address is lost much earlier: #0 initialize_argument_information (num_actuals=2, args=0x3fffffffdb80, args_size=0x3fffffffdf68, n_named_args=3, exp=0x3fffaf9415e0, struct_value_addr_value=0x3fffaf9518c0, fndecl=0x3fffafa5e580, fntype=0x3fffaf8c5898, args_so_far=..., reg_parm_stack_space=32, old_stack_level=0x3fffffffdfb0, old_pending_adj=0x3fffffffdfb8, must_preallocate=0x3fffffffdf9c, ecf_flags=0x3fffffffdfa0, may_tailcall=0x3fffffffdf60, call_from_thunk_p=true) at ../../gcc/calls.c:1343 #1 0x000000001038abfc in expand_call (exp=0x3fffaf9415e0, target=0x0, ignore=0) at ../../gcc/calls.c:2695 #2 0x00000000104f0980 in expand_expr_real_1 (exp=exp@entry=0x3fffaf9415e0, target=<optimized out>, tmode=<optimized out>, modifier=<optimized out>, alt_rtl=0x3fffffffe308, inner_reference_p=<optimized out>) at ../../gcc/expr.c:10492 #3 0x00000000104f50b0 in expand_expr_real (exp=exp@entry=0x3fffaf9415e0, target=<optimized out>, tmode=<optimized out>, modifier=modifier@entry=EXPAND_NORMAL, alt_rtl=alt_rtl@entry=0x3fffffffe308, inner_reference_p=inner_reference_p@entry=false) at ../../gcc/expr.c:8018 #4 0x00000000104ff7f0 in store_expr_with_bounds (exp=exp@entry=0x3fffaf9415e0, target=target@entry=0x3fffaf88c5e8, call_param_p=call_param_p@entry=0, nontemporal=nontemporal@entry=false, btarget=btarget@entry=0x3fffaf920ee8) at ../../gcc/expr.c:5385 #5 0x0000000010502188 in expand_assignment (to=0x3fffaf920ee8, from=0x3fffaf9415e0, nontemporal=<optimized out>) at ../../gcc/expr.c:5154 already in expand assignment to is set as follows: <ssa_name 0x3fffaf920ee8 type <real_type 0x3fffaf8c1500 long double XF size <integer_cst 0x3fffaf880cf0 constant 128> unit size <integer_cst 0x3fffaf880d08 constant 16> align 128 symtab 0 alias set 1 canonical type 0x3fffaf8c1500 precision 80 pointer_to_this <pointer_type 0x3fffaf8c16f8>> visited var <var_decl 0x3fffaf9514d0 retval.2>def_stmt retval.2_2 = func2 (x_1(D)); [return slot optimization] [tail call] version 2> and retval is temporary produced by expand_thunk. I suppose expand_thunk must not do the temporary in this case.