On 19/09/14 15:35, Wilco Dijkstra wrote:
Jiong Wang wrote:

when generating instructions to access local variable, for example a local 
array,

if the array size very big, then we need a temp reg to keep the intermediate 
index,
then use that temp reg as base reg, so that ldr is capable of indexing the 
element.

while this will cause trouble, because the introduce of temp reg break the 
dependence
between the stack variable access and stack adjustment instructions which is 
unsafe
when signal trampoline executed.

this patch add barrier before stack adjustment in epilogue.

ok for trunk?
I believe you need more barriers. Ie. for all SP modifying instructions 
(including ldp
with writeback) except for ones that just remove the outgoing arguments. You 
can avoid
emitting barriers if alloca is not used and there are no locals in the frame 
(common case).

Basically without that any memory access that may alias with the locals could be
scheduled incorrectly. It seems odd that the scheduler does not understand this 
by
default.

thanks for pointing this out.

patch updated, please review.


2014-09-25  Jiong Wang  <jiong.w...@arm.com>
2014-09-25  Wilco Dijkstra  <wilco.dijks...@arm.com>

gcc/
  PR target/63293
  * config/aarch64/aarch64.c (aarch64_expand_epiloue): Add barriers before 
stack adjustment.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 023f9fd..d258425 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2331,6 +2331,9 @@ aarch64_expand_epilogue (bool for_sibcall)
   HOST_WIDE_INT fp_offset;
   HOST_WIDE_INT hard_fp_offset;
   rtx_insn *insn;
+  /* We need to add memory barrier to prevent read from deallocated stack.  */
+  bool need_barrier_p = (get_frame_size () != 0
+			 || cfun->machine->frame.saved_varargs_size);
 
   aarch64_layout_frame ();
 
@@ -2365,6 +2368,9 @@ aarch64_expand_epilogue (bool for_sibcall)
   if (frame_pointer_needed
       && (crtl->outgoing_args_size || cfun->calls_alloca))
     {
+      if (cfun->calls_alloca)
+	emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
+
       insn = emit_insn (gen_add3_insn (stack_pointer_rtx,
 				       hard_frame_pointer_rtx,
 				       GEN_INT (0)));
@@ -2391,6 +2397,9 @@ aarch64_expand_epilogue (bool for_sibcall)
       aarch64_restore_callee_saves (DFmode, fp_offset, V0_REGNUM, V31_REGNUM,
 				    skip_wb, &cfi_ops);
 
+      if (need_barrier_p)
+	emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
+
       if (skip_wb)
 	{
 	  enum machine_mode mode1 = (reg1 <= R30_REGNUM) ? DImode : DFmode;
@@ -2431,6 +2440,9 @@ aarch64_expand_epilogue (bool for_sibcall)
 
   if (frame_size > 0)
     {
+      if (need_barrier_p)
+	emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
+
       if (frame_size >= 0x1000000)
 	{
 	  rtx op0 = gen_rtx_REG (Pmode, IP0_REGNUM);

Reply via email to