Ping.

On 28/07/2016 17:36, Adhemerval Zanella wrote:
> From: Adhemerval Zanella <adhemerval.zane...@linaro.org>
> 
> This patch adds the split-stack support on aarch64 (PR #67877).  As for
> other ports this patch should be used along with glibc and gold support.
> 
> The support is done similar to other architectures: a __private_ss field is
> added on TCB in glibc, a target-specific __morestack implementation and
> helper functions are added in libgcc and compiler supported adjustments
> (split-stack prologue, va_start for argument handling).  I also plan to
> send the gold support to adjust stack allocation acrosss split-stack
> and default code calls.
> 
> Current approach is similar to powerpc one: at most 2 GB of stack allocation
> is support so stack adjustments can be done with 2 instructions (either just
> a movn plus nop or a movn followed by movk).  The morestack call is non
> standard with x10 hollding the requested stack pointer and x11 the required
> stack size to be copied.  Also function arguments on the old stack are
> accessed based on a value relative to the stack pointer, so x10 is used to
> hold theold stack value.  Unwinding is handled by a personality routine that
> knows how to find stack segments.
> 
> Split-stack prologue on function entry is as follow (this goes before the
> usual function prologue):
> 
>       mrs    x9, tpidr_el0
>       mov    x10, -<required stack allocation>
>       nop/movk
>       add    x10, sp, x10
>       ldr    x9, [x9, 16]
>       cmp    x10, x9
>       b.cs    enough
>       stp    x30, [sp, -16]mov    x11, <required arguments copy size>
>       mov    x11, <required arguments copy size>
>       bl     __morestack
>       ldp    x30, [sp], 16
>       ret
> enough:
>       # usual function prologue, modified a little at the end to set up the
>       # arg_pointer in x10, starts here.  The arg_pointer is initialized,
>       # if it is used, with
>       mov     x11, <required stack allocation>
>       add     x10, x29, x11
>       b.cs    function
>       mov     x10, x28
> function:
> 
> Notes:
>  1. Even if a function does not allocate a stack frame, a split-stack prologue
>     is created.  It is to avoid issues with tail call for external symbols
>     which might require linker adjustment (libgo/runtime/go-varargs.c).
> 
>  2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr
>     to after the required stack calculation.
> 
>  3. Similar to powerpc, When the linker detects a call from split-stack to
>     non-split-stack code, it adds 16k (or more) to the value found in 
> "allocate"
>     instructions (so non-split-stack code gets a larger stack).  The amount is
>     tunable by a linker option.  The edit means aarch64 does not need to
>     implement __morestack_non_split, necessary on x86 because insufficient
>     space is available there to edit the stack comparison code.  This feature
>     is only implemented in the GNU gold linker.
> 
>  4. AArch64 does not handle >2G stack initially and although it is possible
>     to implement it, limiting to 2G allows to materize the allocation with
>     only 2 instructions (mov + movk) and thus simplifying the linker
>     adjustments required.  Supporting multiple threads each requiring more
>     than 2G of stack is probably not that important, and likely to OOM at
>     run time.
> 
>  5. The TCB support on GLIBC is meant to be included in version 2.25 [1].
> 
> I tested bootstrapping on aarch64-linux-gnu and although still digesting
> the results I saw no regression.  All cgo tests are passing, although based
> on previous reports in other archs gold support should be present to avoid
> issues on split calling non-split code.
> 
> libgcc/ChangeLog:
> 
>       * libgcc/config.host: Use t-stack and t-statck-aarch64 for
>       aarch64*-*-linux.
>       * libgcc/config/aarch64/morestack-c.c: New file.
>       * libgcc/config/aarch64/morestack.S: Likewise.
>       * libgcc/config/aarch64/t-stack-aarch64: Likewise.
>       * libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific
>       code.
> 
> gcc/ChangeLog:
> 
>       * common/config/aarch64/aarch64-common.c
>       (aarch64_supports_split_stack): New function.
>       (TARGET_SUPPORTS_SPLIT_STACK): New macro.
>       * gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove
>       macro.
>       * gcc/config/aarch64/aarch64-protos.h: Add
>       aarch64_expand_split_stack_prologue and
>       aarch64_split_stack_space_check.
>       * gcc/config/aarch64/aarch64.c (aarch64_expand_prologue): Setup the
>       argument pointer (x10) for split-stack.
>       (aarch64_expand_builtin_va_start): Use internal argument pointer
>       instead of virtual_incoming_args_rtx.
>       (aarch64_expand_split_stack_prologue): New function.
>       (aarch64_file_end): Emit the split-stack note sections.
>       (aarch64_internal_arg_pointer): Likewise.
>       (aarch64_live_on_entry): Set the argument pointer for split-stack.
>       (aarch64_split_stack_space_check): Likewise.
>       (TARGET_ASM_FILE_END): New macro.
>       (TARGET_INTERNAL_ARG_POINTER): Likewise.
>       * gcc/config/aarch64/aarch64.h (aarch64_frame): Add
>       split_stack_arg_pointer to setup the argument pointer when using
>       split-stack.
>       * gcc/config/aarch64/aarch64.md (UNSPEC_STACK_CHECK): New unspec.
>       (UNSPECV_SPLIT_STACK_RETURN): Likewise.
>       (split_stack_prologue): New expand.
>       (split_stack_return): New insn.
>       (split_stack_space_check): New expand.
>       * gcc/testsuite/gcc.dg/split-3.c (down): Call va_end after va_start.
>       * gcc/testsuite/gcc.dg/split-6.c (down): Likewise.
> 
> [1] https://sourceware.org/ml/libc-alpha/2016-07/msg00647.html
> 
> ---
>  gcc/ChangeLog                              |  33 ++++
>  gcc/common/config/aarch64/aarch64-common.c |  16 +-
>  gcc/config/aarch64/aarch64-linux.h         |   2 -
>  gcc/config/aarch64/aarch64-protos.h        |   2 +
>  gcc/config/aarch64/aarch64.c               | 230 +++++++++++++++++++++++-
>  gcc/config/aarch64/aarch64.h               |   3 +
>  gcc/config/aarch64/aarch64.md              |  32 ++++
>  gcc/testsuite/gcc.dg/split-3.c             |   1 +
>  gcc/testsuite/gcc.dg/split-6.c             |   1 +
>  libgcc/ChangeLog                           |  11 ++
>  libgcc/config.host                         |   1 +
>  libgcc/config/aarch64/morestack-c.c        |  95 ++++++++++
>  libgcc/config/aarch64/morestack.S          | 269 
> +++++++++++++++++++++++++++++
>  libgcc/config/aarch64/t-stack-aarch64      |   3 +
>  libgcc/generic-morestack.c                 |   1 +
>  15 files changed, 696 insertions(+), 4 deletions(-)
>  create mode 100644 libgcc/config/aarch64/morestack-c.c
>  create mode 100644 libgcc/config/aarch64/morestack.S
>  create mode 100644 libgcc/config/aarch64/t-stack-aarch64
> 
> diff --git a/gcc/common/config/aarch64/aarch64-common.c 
> b/gcc/common/config/aarch64/aarch64-common.c
> index 08e7959..01c3239 100644
> --- a/gcc/common/config/aarch64/aarch64-common.c
> +++ b/gcc/common/config/aarch64/aarch64-common.c
> @@ -106,6 +106,21 @@ aarch64_handle_option (struct gcc_options *opts,
>      }
>  }
>  
> +/* -fsplit-stack uses a TCB field available on glibc-2.25.  GLIBC also
> +   exports symbol, __tcb_private_ss, to signal it has the field available
> +   on TCB allocation.  This aims to prevent binaries linked against newer
> +   GLIBC to run on non-supported ones.  */
> +
> +static bool
> +aarch64_supports_split_stack (bool report ATTRIBUTE_UNUSED,
> +                           struct gcc_options *opts ATTRIBUTE_UNUSED)
> +{
> +  return true;
> +}
> +
> +#undef TARGET_SUPPORTS_SPLIT_STACK
> +#define TARGET_SUPPORTS_SPLIT_STACK aarch64_supports_split_stack
> +
>  struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
>  
>  /* An ISA extension in the co-processor and main instruction set space.  */
> @@ -342,4 +357,3 @@ aarch64_rewrite_mcpu (int argc, const char **argv)
>  }
>  
>  #undef AARCH64_CPU_NAME_LENGTH
> -
> diff --git a/gcc/config/aarch64/aarch64-linux.h 
> b/gcc/config/aarch64/aarch64-linux.h
> index 5fcaa59..ab3208b 100644
> --- a/gcc/config/aarch64/aarch64-linux.h
> +++ b/gcc/config/aarch64/aarch64-linux.h
> @@ -80,8 +80,6 @@
>      }                                                \
>    while (0)
>  
> -#define TARGET_ASM_FILE_END file_end_indicate_exec_stack
> -
>  /* Uninitialized common symbols in non-PIE executables, even with
>     strong definitions in dependent shared libraries, will resolve
>     to COPY relocated symbol in the executable.  See PR65780.  */
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 3cdd69b..82a4e11 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -377,6 +377,8 @@ void aarch64_err_no_fpadvsimd (machine_mode, const char 
> *);
>  void aarch64_expand_epilogue (bool);
>  void aarch64_expand_mov_immediate (rtx, rtx);
>  void aarch64_expand_prologue (void);
> +void aarch64_expand_split_stack_prologue (void);
> +void aarch64_split_stack_space_check (rtx, rtx);
>  void aarch64_expand_vector_init (rtx, rtx);
>  void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
>                                  const_tree, unsigned);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index e56398a..2cf239f 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3227,6 +3227,34 @@ aarch64_expand_prologue (void)
>         RTX_FRAME_RELATED_P (insn) = 1;
>       }
>      }
> +
> +  if (flag_split_stack && offset)
> +    {
> +      /* Setup the argument pointer (x10) for -fsplit-stack code.  If
> +      __morestack was called, it will left the arg pointer to the
> +      old stack in x28.  Otherwise, the argument pointer is the top
> +      of current frame.  */
> +      rtx x10 = gen_rtx_REG (Pmode, R10_REGNUM);
> +      rtx x11 = gen_rtx_REG (Pmode, R11_REGNUM);
> +      rtx x28 = gen_rtx_REG (Pmode, R28_REGNUM);
> +      rtx x29 = gen_rtx_REG (Pmode, R29_REGNUM);
> +      rtx not_more = gen_label_rtx ();
> +      rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
> +      rtx jump;
> +
> +      emit_move_insn (x11, GEN_INT (hard_fp_offset));
> +      emit_insn (gen_add3_insn (x10, x29, x11));
> +      jump = gen_rtx_IF_THEN_ELSE (VOIDmode,
> +                                gen_rtx_GEU (VOIDmode, cc_reg,
> +                                             const0_rtx),
> +                                gen_rtx_LABEL_REF (VOIDmode, not_more),
> +                                pc_rtx);
> +      jump = emit_jump_insn (gen_rtx_SET (pc_rtx, jump));
> +      JUMP_LABEL (jump) = not_more;
> +      LABEL_NUSES (not_more) += 1;
> +      emit_move_insn (x10, x28);
> +      emit_label (not_more);
> +    }
>  }
>  
>  /* Return TRUE if we can use a simple_return insn.
> @@ -3303,6 +3331,7 @@ aarch64_expand_epilogue (bool for_sibcall)
>        offset = offset - fp_offset;
>      }
>  
> +
>    if (offset > 0)
>      {
>        unsigned reg1 = cfun->machine->frame.wb_candidate1;
> @@ -9648,7 +9677,7 @@ aarch64_expand_builtin_va_start (tree valist, rtx 
> nextarg ATTRIBUTE_UNUSED)
>    /* Emit code to initialize STACK, which points to the next varargs stack
>       argument.  CUM->AAPCS_STACK_SIZE gives the number of stack words used
>       by named arguments.  STACK is 8-byte aligned.  */
> -  t = make_tree (TREE_TYPE (stack), virtual_incoming_args_rtx);
> +  t = make_tree (TREE_TYPE (stack), crtl->args.internal_arg_pointer);
>    if (cum->aapcs_stack_size > 0)
>      t = fold_build_pointer_plus_hwi (t, cum->aapcs_stack_size * 
> UNITS_PER_WORD);
>    t = build2 (MODIFY_EXPR, TREE_TYPE (stack), stack, t);
> @@ -14010,6 +14039,196 @@ aarch64_optab_supported_p (int op, machine_mode 
> mode1, machine_mode,
>      }
>  }
>  
> +/* -fsplit-stack support.  */
> +
> +/* A SYMBOL_REF for __morestack.  */
> +static GTY(()) rtx morestack_ref;
> +
> +/* Emit -fsplit-stack prologue, which goes before the regular function
> +   prologue.  */
> +void
> +aarch64_expand_split_stack_prologue (void)
> +{
> +  HOST_WIDE_INT frame_size, args_size;
> +  rtx_code_label *ok_label = NULL;
> +  rtx mem, ssvalue, compare, jump, insn, call_fusage;
> +  rtx reg11, reg30, temp;
> +  rtx new_cfa, cfi_ops = NULL;
> +  /* Offset from thread pointer to __private_ss.  */
> +  int psso = 0x10;
> +  int ninsn;
> +
> +  gcc_assert (flag_split_stack && reload_completed);
> +
> +  /* It limits total maximum stack allocation on 2G so its value can be
> +     materialized with two instruction at most (movn/movk).  It might be
> +     used by the linker to add some extra space for split calling non split
> +     stack functions.  */
> +  frame_size = cfun->machine->frame.frame_size;
> +  if (frame_size > ((HOST_WIDE_INT) 1 << 31))
> +    {
> +      sorry ("Stack frame larger than 2G is not supported for 
> -fsplit-stack");
> +      return;
> +    }
> +
> +  if (morestack_ref == NULL_RTX)
> +    {
> +      morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack");
> +      SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL
> +                                        | SYMBOL_FLAG_FUNCTION);
> +    }
> +
> +  /* Load __private_ss from TCB.  */
> +  ssvalue = gen_rtx_REG (Pmode, R9_REGNUM);
> +  emit_insn (gen_aarch64_load_tp_hard (ssvalue));
> +  mem = gen_rtx_MEM (Pmode, plus_constant (Pmode, ssvalue, psso));
> +  emit_move_insn (ssvalue, mem);
> +
> +  temp = gen_rtx_REG (Pmode, R10_REGNUM);
> +
> +  /* Always emit two insns to calculate the requested stack, so the linker
> +     can edit them when adjusting size for calling non-split-stack code.  */
> +  ninsn = aarch64_internal_mov_immediate (temp, GEN_INT (-frame_size), true,
> +                                       Pmode);
> +  gcc_assert (ninsn == 1 || ninsn == 2);
> +  if (ninsn == 1)
> +    emit_insn (gen_nop ());
> +  emit_insn (gen_add3_insn (temp, stack_pointer_rtx, temp));
> +
> +  compare = aarch64_gen_compare_reg (LT, temp, ssvalue);
> +
> +  /* Jump to __morestack call if current __private_ss is not suffice.  */
> +  ok_label = gen_label_rtx ();
> +  jump = gen_rtx_IF_THEN_ELSE (VOIDmode,
> +                            gen_rtx_GEU (VOIDmode, compare, const0_rtx),
> +                            gen_rtx_LABEL_REF (VOIDmode, ok_label),
> +                            pc_rtx);
> +  jump = emit_jump_insn (gen_rtx_SET (pc_rtx, jump));
> +  JUMP_LABEL (jump) = ok_label;
> +
> +  /* Mark the jump as very likely to be taken.  */
> +  add_int_reg_note (jump, REG_BR_PROB, REG_BR_PROB_BASE / 100 - 1);
> +
> +  call_fusage = NULL_RTX;
> +
> +  /* Call __morestack with a non-standard call procedure: x10 will hold
> +     the requested stack pointer and x11 the required stack size to be
> +     copied.  */
> +  args_size = crtl->args.size >= 0 ? crtl->args.size : 0;
> +  reg11 = gen_rtx_REG (DImode, R11_REGNUM);
> +  emit_move_insn (reg11, GEN_INT (args_size));
> +  use_reg (&call_fusage, reg11);
> +
> +  /* Set up a minimum frame pointer to call __morestack.  The SP is not
> +     save on x29 prior so in __morestack x29 points to the called SP.  */
> +  reg30 = gen_rtx_REG (Pmode, R30_REGNUM);
> +  aarch64_pushwb_single_reg (Pmode, R30_REGNUM, 16);
> +
> +  insn = emit_call_insn (gen_call (gen_rtx_MEM (DImode, morestack_ref),
> +                                const0_rtx, const0_rtx));
> +  add_function_usage_to (insn, call_fusage);
> +
> +  cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg30, cfi_ops);
> +  mem = plus_constant (Pmode, stack_pointer_rtx, 16);
> +  cfi_ops = alloc_reg_note (REG_CFA_DEF_CFA, stack_pointer_rtx, cfi_ops);
> +
> +  mem = gen_rtx_POST_MODIFY (Pmode, stack_pointer_rtx, mem);
> +  mem = gen_rtx_MEM (DImode, mem);
> +  insn = emit_move_insn (reg30, mem);
> +
> +  new_cfa = stack_pointer_rtx;
> +  new_cfa = plus_constant (Pmode, new_cfa, 16);
> +  cfi_ops = alloc_reg_note (REG_CFA_DEF_CFA, new_cfa, cfi_ops);
> +  REG_NOTES (insn) = cfi_ops;
> +  RTX_FRAME_RELATED_P (insn) = 1;
> +
> +  emit_insn (gen_split_stack_return ());
> +
> +  emit_label (ok_label);
> +  LABEL_NUSES (ok_label) = 1;
> +}
> +
> +/* Implement TARGET_ASM_FILE_END.  */
> +static void
> +aarch64_file_end (void)
> +{
> +  file_end_indicate_exec_stack ();
> +
> +  if (flag_split_stack)
> +    file_end_indicate_split_stack ();
> +}
> +
> +/* Return the internal arg pointer used for function incoming arguments.  */
> +static rtx
> +aarch64_internal_arg_pointer (void)
> +{
> +  if (flag_split_stack)
> +    {
> +      if (cfun->machine->frame.split_stack_arg_pointer == NULL_RTX)
> +     {
> +       rtx pat;
> +
> +       cfun->machine->frame.split_stack_arg_pointer = gen_reg_rtx (Pmode);
> +       REG_POINTER (cfun->machine->frame.split_stack_arg_pointer) = 1;
> +
> +       /* Put the pseudo initialization right after the note at the
> +          beginning of the function.  */
> +       pat = gen_rtx_SET (cfun->machine->frame.split_stack_arg_pointer,
> +                          gen_rtx_REG (Pmode, R10_REGNUM));
> +       push_topmost_sequence ();
> +       emit_insn_after (pat, get_insns ());
> +       pop_topmost_sequence ();
> +     }
> +      return plus_constant (Pmode, 
> cfun->machine->frame.split_stack_arg_pointer,
> +                         FIRST_PARM_OFFSET (current_function_decl));
> +    }
> +  return virtual_incoming_args_rtx;
> +}
> +
> +static void
> +aarch64_live_on_entry (bitmap regs)
> +{
> +  if (flag_split_stack)
> +    bitmap_set_bit (regs, R10_REGNUM);
> +}
> +
> +/* Emit -fsplit-stack dynamic stack allocation space check.  */
> +
> +void
> +aarch64_split_stack_space_check (rtx size, rtx label)
> +{
> +  rtx mem, ssvalue, compare, jump;
> +  rtx requested = gen_reg_rtx (Pmode);
> +  /* Offset from thread pointer to __private_ss.  */
> +  int psso = 0x10;
> +
> +  /* Load __private_ss from TCB.  */
> +  ssvalue = gen_rtx_REG (Pmode, R9_REGNUM);
> +  emit_insn (gen_aarch64_load_tp_hard (ssvalue));
> +  mem = gen_rtx_MEM (Pmode, plus_constant (Pmode, ssvalue, psso));
> +  emit_move_insn (ssvalue, mem);
> +
> +  /* And compare it with frame pointer plus required stack.  */
> +  if (CONST_INT_P (size))
> +     emit_insn (gen_add3_insn (requested, stack_pointer_rtx,
> +                            GEN_INT (-INTVAL (size))));
> +  else
> +    {
> +      size = force_reg (Pmode, size);
> +      emit_move_insn (requested, gen_rtx_MINUS (Pmode, stack_pointer_rtx,
> +                                             size));
> +    }
> +
> +  /* Jump to __morestack call if current __private_ss is not suffice.  */
> +  compare = aarch64_gen_compare_reg (LT, requested, ssvalue);
> +  jump = gen_rtx_IF_THEN_ELSE (VOIDmode,
> +                            gen_rtx_GEU (VOIDmode, compare, const0_rtx),
> +                            gen_rtx_LABEL_REF (VOIDmode, label),
> +                            pc_rtx);
> +  jump = emit_jump_insn (gen_rtx_SET (pc_rtx, jump));
> +  JUMP_LABEL (jump) = label;
> +}
> +
>  #undef TARGET_ADDRESS_COST
>  #define TARGET_ADDRESS_COST aarch64_address_cost
>  
> @@ -14036,6 +14255,9 @@ aarch64_optab_supported_p (int op, machine_mode 
> mode1, machine_mode,
>  #undef TARGET_ASM_FILE_START
>  #define TARGET_ASM_FILE_START aarch64_start_file
>  
> +#undef TARGET_ASM_FILE_END
> +#define TARGET_ASM_FILE_END aarch64_file_end
> +
>  #undef TARGET_ASM_OUTPUT_MI_THUNK
>  #define TARGET_ASM_OUTPUT_MI_THUNK aarch64_output_mi_thunk
>  
> @@ -14118,6 +14340,12 @@ aarch64_optab_supported_p (int op, machine_mode 
> mode1, machine_mode,
>  #undef TARGET_FRAME_POINTER_REQUIRED
>  #define TARGET_FRAME_POINTER_REQUIRED aarch64_frame_pointer_required
>  
> +#undef TARGET_EXTRA_LIVE_ON_ENTRY
> +#define TARGET_EXTRA_LIVE_ON_ENTRY aarch64_live_on_entry
> +
> +#undef TARGET_INTERNAL_ARG_POINTER
> +#define TARGET_INTERNAL_ARG_POINTER aarch64_internal_arg_pointer
> +
>  #undef TARGET_GIMPLE_FOLD_BUILTIN
>  #define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin
>  
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 1915980..0ba3172 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -570,6 +570,9 @@ struct GTY (()) aarch64_frame
>  
>    HOST_WIDE_INT frame_size;
>  
> +  /* Alternative internal arg pointer for -fsplit-stack.  */
> +  rtx split_stack_arg_pointer;
> +
>    bool laid_out;
>  };
>  
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 9e87a0d..8992608 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -130,6 +130,7 @@
>      UNSPEC_VSTRUCTDUMMY
>      UNSPEC_SP_SET
>      UNSPEC_SP_TEST
> +    UNSPEC_STACK_CHECK
>      UNSPEC_RSQRT
>      UNSPEC_RSQRTE
>      UNSPEC_RSQRTS
> @@ -144,6 +145,7 @@
>      UNSPECV_SET_FPSR         ; Represent assign of FPSR content.
>      UNSPECV_BLOCKAGE         ; Represent a blockage
>      UNSPECV_PROBE_STACK_RANGE        ; Represent stack range probing.
> +    UNSPECV_SPLIT_STACK_RETURN  ; Represent a camouflaged return
>    ]
>  )
>  
> @@ -5394,3 +5396,33 @@
>  
>  ;; ldp/stp peephole patterns
>  (include "aarch64-ldpstp.md")
> +
> +;; Handle -fsplit-stack
> +(define_expand "split_stack_prologue"
> +  [(const_int 0)]
> +  ""
> +{
> +  aarch64_expand_split_stack_prologue ();
> +  DONE;
> +})
> +
> +;; A return instruction which the middle-end does not see.
> +(define_insn "split_stack_return"
> +  [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_RETURN)]
> +  ""
> +  "ret"
> +  [(set_attr "type" "branch")])
> +
> +;; If there are operand 0 bytes available on the stack, jump to
> +;; operand 1.
> +(define_expand "split_stack_space_check"
> +  [(set (match_dup 2) (compare:CC (match_dup 3) (match_dup 2)))
> +   (set (pc) (if_then_else
> +           (geu (match_dup 4) (const_int 0))
> +           (label_ref (match_operand 1))
> +           (pc)))]
> +  ""
> +{
> +  aarch64_split_stack_space_check (operands[0], operands[1]);
> +  DONE;
> +})
> diff --git a/gcc/testsuite/gcc.dg/split-3.c b/gcc/testsuite/gcc.dg/split-3.c
> index 64bbb8c..5ba7616 100644
> --- a/gcc/testsuite/gcc.dg/split-3.c
> +++ b/gcc/testsuite/gcc.dg/split-3.c
> @@ -40,6 +40,7 @@ down (int i, ...)
>        || va_arg (ap, int) != 9
>        || va_arg (ap, int) != 10)
>      abort ();
> +  va_end (ap);
>  
>    if (i > 0)
>      {
> diff --git a/gcc/testsuite/gcc.dg/split-6.c b/gcc/testsuite/gcc.dg/split-6.c
> index b32cf8d..b3016ba 100644
> --- a/gcc/testsuite/gcc.dg/split-6.c
> +++ b/gcc/testsuite/gcc.dg/split-6.c
> @@ -37,6 +37,7 @@ down (int i, ...)
>        || va_arg (ap, int) != 9
>        || va_arg (ap, int) != 10)
>      abort ();
> +  va_end (ap);
>  
>    if (i > 0)
>      {
> diff --git a/libgcc/config.host b/libgcc/config.host
> index 4ccf25d..18f49f1 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -336,6 +336,7 @@ aarch64*-*-linux*)
>       md_unwind_header=aarch64/linux-unwind.h
>       tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
>       tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
> +     tmake_file="${tmake_file} t-stack aarch64/t-stack-aarch64"
>       ;;
>  alpha*-*-linux*)
>       tmake_file="${tmake_file} alpha/t-alpha alpha/t-ieee t-crtfm 
> alpha/t-linux"
> diff --git a/libgcc/config/aarch64/morestack-c.c 
> b/libgcc/config/aarch64/morestack-c.c
> new file mode 100644
> index 0000000..8df7895
> --- /dev/null
> +++ b/libgcc/config/aarch64/morestack-c.c
> @@ -0,0 +1,95 @@
> +/* AArch64 support for -fsplit-stack.
> + * Copyright (C) 2016 Free Software Foundation, Inc.
> + *
> + * This file is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the
> + * Free Software Foundation; either version 3, or (at your option) any
> + * later version.
> + *
> + * This file is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * Under Section 7 of GPL version 3, you are granted additional
> + * permissions described in the GCC Runtime Library Exception, version
> + * 3.1, as published by the Free Software Foundation.
> + *
> + * You should have received a copy of the GNU General Public License and
> + * a copy of the GCC Runtime Library Exception along with this program;
> + * see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> + * <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef inhibit_libc
> +
> +#include <stdint.h>
> +#include <stdlib.h>
> +#include <stddef.h>
> +#include "generic-morestack.h"
> +
> +/* This is based on GLIBC definition (version 2.24).  There is no need to
> +   keep it sync since new fields are added on the end of structure and do
> +   not change the '__private_ss' layout.  */
> +typedef struct
> +{
> +  void *dtv;
> +  void *private;
> +  void *__private_ss;
> +} tcbhead_t;
> +
> +#define INITIAL_STACK_SIZE  0x4000
> +#define BACKOFF             0x1000
> +
> +void __generic_morestack_set_initial_sp (void *sp, size_t len);
> +void *__morestack_get_guard (void);
> +void __morestack_set_guard (void *);
> +void *__morestack_make_guard (void *stack, size_t size);
> +void __morestack_load_mmap (void);
> +
> +/* We declare is as weak so it fails either at stack linking or
> +   at runtime if the GLIBC does not have the required TCB field.  */
> +extern void __tcb_private_ss (void) __attribute__ ((weak));
> +
> +/* Initialize the stack guard when the program starts or when a new
> +   thread.  This is called from a constructor using ctors section.  */
> +void
> +__stack_split_initialize (void)
> +{
> +  __tcb_private_ss ();
> +
> +  register void* sp __asm__ ("sp");
> +  tcbhead_t *tcb = ((tcbhead_t *) __builtin_thread_pointer ());
> +  tcb->__private_ss = (void*)((uintptr_t)sp - INITIAL_STACK_SIZE);
> +  return __generic_morestack_set_initial_sp (sp, INITIAL_STACK_SIZE);
> +}
> +
> +/* Return current __private_ss.  */
> +void *
> +__morestack_get_guard (void)
> +{
> +  tcbhead_t *tcb = ((tcbhead_t *) __builtin_thread_pointer ());
> +  return tcb->__private_ss;
> +}
> +
> +/* Set __private_ss to ptr.  */
> +void
> +__morestack_set_guard (void *ptr)
> +{
> +  tcbhead_t *tcb = ((tcbhead_t *) __builtin_thread_pointer ());
> +  tcb->__private_ss = ptr;
> +}
> +
> +/* Return the stack guard value for given stack.  */
> +void *
> +__morestack_make_guard (void *stack, size_t size)
> +{
> +  return (void*)((uintptr_t)stack - size + BACKOFF);
> +}
> +
> +/* Make __stack_split_initialize a high priority constructor.  */
> +static void (*const ctors []) 
> +  __attribute__ ((used, section (".ctors.65535"), aligned (sizeof (void *))))
> +  = { __stack_split_initialize, __morestack_load_mmap };
> +
> +#endif /* !defined (inhibit_libc) */
> diff --git a/libgcc/config/aarch64/morestack.S 
> b/libgcc/config/aarch64/morestack.S
> new file mode 100644
> index 0000000..5bbac4c
> --- /dev/null
> +++ b/libgcc/config/aarch64/morestack.S
> @@ -0,0 +1,269 @@
> +# AArch64 support for -fsplit-stack.
> +# Copyright (C) 2016 Free Software Foundation, Inc.
> +
> +# This file is part of GCC.
> +
> +# GCC is free software; you can redistribute it and/or modify it under
> +# the terms of the GNU General Public License as published by the Free
> +# Software Foundation; either version 3, or (at your option) any later
> +# version.
> +
> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +# for more details.
> +
> +# Under Section 7 of GPL version 3, you are granted additional
> +# permissions described in the GCC Runtime Library Exception, version
> +# 3.1, as published by the Free Software Foundation.
> +
> +# You should have received a copy of the GNU General Public License and
> +# a copy of the GCC Runtime Library Exception along with this program;
> +# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +# <http://www.gnu.org/licenses/>.
> +
> +/* Define an entry point visible from C.  */
> +#define ENTRY(name)                                          \
> +  .globl name;                                                       \
> +  .type name,%function;                                              \
> +  .align 4;                                                  \
> +  name##:
> +
> +#define END(name)                                            \
> +  .size name,.-name
> +
> +
> +#define MORESTACK_FRAMESIZE  112
> +/* Offset based on function stack to get its argument from __morestack
> +   frame.  */
> +#define STACKFRAME_BASE              (-MORESTACK_FRAMESIZE - 16)
> +/* Offset from __morestack frame where the new stack size is saved and
> +   passed to __generic_morestack.  */
> +#define NEWSTACK_SAVE                88
> +/* Offset from __morestack frame where the arguments size saved and
> +   passed to __generic_morestack.  */
> +#define ARGS_SIZE_SAVE               80
> +
> +#define BACKOFF                      0x2000
> +# Large excess allocated when calling non-split-stack code.
> +#define NON_SPLIT_STACK              0x100000
> +
> +# TCB offset of __private_ss
> +#define TCB_PRIVATE_SS               #16
> +
> +     .text
> +ENTRY(__morestack_non_split)
> +     .cfi_startproc
> +# We use a cleanup to restore the tcbhead_t.__private_ss if
> +# an exception is thrown through this code.
> +     add     x11, x11, NON_SPLIT_STACK
> +     .cfi_endproc
> +END(__morestack_non_split)
> +# Fall through into __morestack
> +
> +# This function is called with non-standard calling conventions.  On entry
> +# x10 is the requested stack pointer.  The split-stack prologue is in the
> +# form:
> +#
> +#    mrs    x9, tpidr_el0
> +#    mov    x10, -<required stack allocation>
> +#    add    x10, sp, x10
> +#    ldr    x9, [x9, 16]
> +#    cmp    x10, x9
> +#    bcs    enough
> +#    stp    x30, [sp, -16]!
> +#    mov    x11, <required arguments copy size>
> +#    bl     __morestack
> +#    ldp    x30, [sp], 16
> +#    ret
> +# enough:
> +#
> +# The normal function prologue follows here, with a small addition at the
> +# end to set up the argument pointer.  The argument pointer is setup with:
> +#
> +#    mov     x11, <required stack allocation>
> +#    sub     sp, sp, <required stack allocation>
> +#    add     x10, x29, x11
> +#    b.cs    function:
> +#    mov     x10, x28
> +# function:
> +#
> +# Note that all argument parameter registers and the x10 (the argument
> +# pointer) are saved.  The N bit is also saved and restores to indicate
> +# that the function is called (so the prologue addition can set up the
> +# argument pointer correctly).
> +
> +ENTRY(__morestack)
> +.LFB1:
> +     .cfi_startproc
> +
> +#ifdef __PIC__
> +     .cfi_personality 0x9b,DW.ref.__gcc_personality_v0
> +     .cfi_lsda 0x1b,.LLSDA1
> +#else
> +     .cfi_personality 0x3,__gcc_personality_v0
> +     .cfi_lsda 0x3,.LLSDA1
> +#endif
> +
> +     # Calculate requested stack size.
> +     sub     x12, sp, x10
> +     # Save parameters
> +     stp     x29, x30, [sp, -MORESTACK_FRAMESIZE]!
> +     .cfi_def_cfa_offset MORESTACK_FRAMESIZE
> +     .cfi_offset 29, -MORESTACK_FRAMESIZE
> +     .cfi_offset 30, -MORESTACK_FRAMESIZE+8
> +     add     x29, sp, 0
> +     .cfi_def_cfa_register 29
> +     # Adjust the requested stack size for the frame pointer save.
> +     add     x12, x12, 16
> +     stp     x0, x1, [sp, 16]
> +     stp     x2, x3, [sp, 32]
> +     add     x12, x12, BACKOFF
> +     stp     x4, x5, [sp, 48]
> +     stp     x6, x7, [sp, 64]
> +     stp     x11, x12, [sp, 80]
> +     str     x28, [sp, 96]
> +
> +     # Setup on x28 the function initial frame pointer.  Its value will
> +     # copied to function argument pointer.
> +     add     x28, sp, MORESTACK_FRAMESIZE + 16
> +
> +     # void __morestack_block_signals (void)
> +     bl      __morestack_block_signals
> +
> +     # void *__generic_morestack (size_t *pframe_size,
> +     #                            void *old_stack,
> +     #                            size_t param_size)
> +     # pframe_size: is the size of the required stack frame (the function
> +     #              amount of space remaining on the allocated stack).
> +     # old_stack: points at the parameters the old stack
> +     # param_size: size in bytes of parameters to copy to the new stack.
> +     add     x0, x28, STACKFRAME_BASE + NEWSTACK_SAVE
> +     mov     x1, x28
> +     ldr     x2, [sp, ARGS_SIZE_SAVE]
> +     bl      __generic_morestack
> +
> +     # Start using new stack
> +     stp     x29, x30, [x0, -16]!
> +     mov     sp, x0
> +
> +     # Set __private_ss stack guard for the new stack.
> +     ldr     x9, [x28, STACKFRAME_BASE + NEWSTACK_SAVE]
> +     add     x0, x0, BACKOFF
> +     sub     x0, x0, 16
> +     sub     x0, x0, x9
> +.LEHB0:
> +     mrs     x1, tpidr_el0
> +     str     x0, [x1, TCB_PRIVATE_SS]
> +
> +     # void __morestack_unblock_signals (void)
> +     bl      __morestack_unblock_signals
> +
> +     # Set up for a call to the target function.
> +     #ldp    x29, x30, [x28, STACKFRAME_BASE]
> +     ldr     x30, [x28, STACKFRAME_BASE + 8]
> +     ldp     x0, x1, [x28, STACKFRAME_BASE + 16]
> +     ldp     x2, x3, [x28, STACKFRAME_BASE + 32]
> +     ldp     x4, x5, [x28, STACKFRAME_BASE + 48]
> +     ldp     x6, x7, [x28, STACKFRAME_BASE + 64]
> +     add     x9, x30, 8
> +     cmp     x30, x9
> +     blr     x9
> +
> +     stp     x0, x1, [x28, STACKFRAME_BASE + 16]
> +     stp     x2, x3, [x28, STACKFRAME_BASE + 32]
> +     stp     x4, x5, [x28, STACKFRAME_BASE + 48]
> +     stp     x6, x7, [x28, STACKFRAME_BASE + 64]
> +
> +     bl      __morestack_block_signals
> +
> +     # void *__generic_releasestack (size_t *pavailable)
> +     add     x0, x28, STACKFRAME_BASE + NEWSTACK_SAVE
> +     bl      __generic_releasestack
> +
> +     # Reset __private_ss stack guard to value for old stack
> +     ldr     x9, [x28, STACKFRAME_BASE + NEWSTACK_SAVE]
> +     add     x0, x0, BACKOFF
> +     sub     x0, x0, x9
> +
> +     # Update TCB split stack field
> +.LEHE0:
> +     mrs     x1, tpidr_el0
> +     str     x0, [x1, TCB_PRIVATE_SS]
> +
> +     bl __morestack_unblock_signals
> +
> +     # Use old stack again.
> +     sub     sp, x28, 16
> +
> +     ldp     x0, x1, [x28, STACKFRAME_BASE + 16]
> +     ldp     x2, x3, [x28, STACKFRAME_BASE + 32]
> +     ldp     x4, x5, [x28, STACKFRAME_BASE + 48]
> +     ldp     x6, x7, [x28, STACKFRAME_BASE + 64]
> +     ldp     x29, x30, [x28, STACKFRAME_BASE]
> +     ldr     x28, [x28, STACKFRAME_BASE + 96]
> +
> +     .cfi_remember_state
> +     .cfi_restore 30
> +     .cfi_restore 29
> +     .cfi_def_cfa 31, 0
> +
> +     ret
> +
> +# This is the cleanup code called by the stack unwinder when
> +# unwinding through code between .LEHB0 and .LEHE0 above.
> +cleanup:
> +     .cfi_restore_state
> +     str     x0, [x28, STACKFRAME_BASE]
> +     # size_t __generic_findstack (void *stack)
> +     mov     x0, x28
> +     bl      __generic_findstack
> +     sub     x0, x28, x0
> +     add     x0, x0, BACKOFF
> +     # Restore tcbhead_t.__private_ss
> +     mrs     x1, tpidr_el0
> +     str     x0, [x1, TCB_PRIVATE_SS]
> +     ldr     x0, [x28, STACKFRAME_BASE]
> +     b       _Unwind_Resume
> +        .cfi_endproc
> +END(__morestack)
> +
> +     .section .gcc_except_table,"a",@progbits
> +     .align 4
> +.LLSDA1:
> +     # @LPStart format (omit)
> +        .byte   0xff
> +     # @TType format (omit)
> +        .byte   0xff
> +     # Call-site format (uleb128)
> +        .byte   0x1
> +     # Call-site table length
> +        .uleb128 .LLSDACSE1-.LLSDACSB1
> +.LLSDACSB1:
> +     # region 0 start
> +        .uleb128 .LEHB0-.LFB1
> +     # length
> +        .uleb128 .LEHE0-.LEHB0
> +     # landing pad
> +        .uleb128 cleanup-.LFB1
> +     # no action (ie a cleanup)
> +        .uleb128 0
> +.LLSDACSE1:
> +
> +
> +     .global __gcc_personality_v0
> +#ifdef __PIC__
> +     # Build a position independent reference to the personality function.
> +     .hidden DW.ref.__gcc_personality_v0
> +     .weak   DW.ref.__gcc_personality_v0
> +     .section 
> .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat
> +     .type   DW.ref.__gcc_personality_v0, @object
> +     .align 3
> +DW.ref.__gcc_personality_v0:
> +     .size   DW.ref.__gcc_personality_v0, 8
> +     .quad   __gcc_personality_v0
> +#endif
> +
> +     .section .note.GNU-stack,"",@progbits
> +     .section .note.GNU-split-stack,"",@progbits
> +     .section .note.GNU-no-split-stack,"",@progbits
> diff --git a/libgcc/config/aarch64/t-stack-aarch64 
> b/libgcc/config/aarch64/t-stack-aarch64
> new file mode 100644
> index 0000000..4babb4e
> --- /dev/null
> +++ b/libgcc/config/aarch64/t-stack-aarch64
> @@ -0,0 +1,3 @@
> +# Makefile fragment to support -fsplit-stack for aarch64.
> +LIB2ADD_ST += $(srcdir)/config/aarch64/morestack.S \
> +           $(srcdir)/config/aarch64/morestack-c.c
> diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c
> index b8eec4e..fe7092b 100644
> --- a/libgcc/generic-morestack.c
> +++ b/libgcc/generic-morestack.c
> @@ -943,6 +943,7 @@ __splitstack_find (void *segment_arg, void *sp, size_t 
> *len,
>        nsp -= 2 * 160;
>  #elif defined __s390__
>        nsp -= 2 * 96;
> +#elif defined __aarch64__
>  #else
>  #error "unrecognized target"
>  #endif
> 

Reply via email to