Whee, fun, this appears to have been broken for a very long time,
possibly since the introduction of -fstack-check in 2010.  Thankfully it
only affects 32 bit and only in relatively constrained circumstances.

-fstack-check and -fstack-clash both potentially create a loop for stack
probing.  In that case they both need a scratch register to hold the
loop upper bound.

The code to allocate a scratch register first starts with the
caller-saved registers as they're zero cost.  Then it'll use any callee
saved register that is actually saved.  If neither is available (say
because all the caller-saved regs are used for parameter passing and
there are no callee saved register used), then the allocation routine
will push %eax on the stack and the deallocation routine will pop it off
to restore its value.

Of course there is a *stack allocation* between those two points.  So
the pop ends up restoring garbage into the register.

Obviously the restore routine needs to use reg+d addressing to get to
the stack slot and the allocated space needs to be deallocated by the
epilogue.  But sadly there's enough assertions sprinkled around that
prevent that from working as-is.

So what this patch does is continue to use the push to allocate the
register.  And it uses reg+d to restore the register after the probing
loops.  The "trick" is that the space allocated by the push becomes part
of the normal stack frame after we restore the scratch register's value.
 Ie, if we push a 4 byte register, then we reduce the size of the main
allocation request by 4 bytes.  And everything just works.

Clearly this code had not be exercised.  So I hacked up things so that
we always generated a probing loop and always assumed that we needed to
save/restore a scratch register and enabled stack clash protections by
default.  I bootstrapped that and compared testsuite runs against a run
which just had stack clash protection on by default.  That did turn up
an issue, but it was with my testing hacks, not with this patch :-)


I (of course) also did the usual bootstrap and regression tests, using
x86_64 and i686.  Hopefully this is the last iteration on the x86/x86_64
stack clash bits :-)

The one concern I have is do we need to tell the CFI machinery that
%eax's value was restored to its entry value?

OK for the trunk?  Or do we need some magic CFI bit to describe the
restore of %eax?

Thanks,
Jeff
        PR target/84128
        * config/i386/i386.c (release_scratch_register_on_entry): Add new
        OFFSET argument.  Use it to restore the scratch register rathern
        than a pop insn.
        (ix86_adjust_stack_and_probe_stack_clash): Un-constify SIZE.
        If we have to save a temporary register, decrement SIZE appropriately.
        Pass SIZE as the offset to find the saved register into
        release_scratch_register_on_entry.
        (ix86_adjust_stack_and_probe): Likewise.
        (ix86_emit_probe_stack_range): Likewise.


        PR target/84128
        * gcc.target/i386/pr84128.c: New test.


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index fef34a1..93ce79c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12567,22 +12567,25 @@ get_scratch_register_on_entry (struct scratch_reg *sr)
     }
 }
 
-/* Release a scratch register obtained from the preceding function.  */
+/* Release a scratch register obtained from the preceding function.
+
+   Note there will always be some kind of stack adjustment between
+   allocation and releasing the scratch register.  So we can't just
+   pop the scratch register off the stack if we were forced to save it
+   (the stack pointer itself has a different value).
+
+   Instead we're passed the offset into the stack where the value will
+   be found and the space becomes part of the local frame that is
+   deallocated by the epilogue.  */
 
 static void
-release_scratch_register_on_entry (struct scratch_reg *sr)
+release_scratch_register_on_entry (struct scratch_reg *sr, HOST_WIDE_INT 
offset)
 {
   if (sr->saved)
     {
-      struct machine_function *m = cfun->machine;
-      rtx x, insn = emit_insn (gen_pop (sr->reg));
-
-      /* The RTX_FRAME_RELATED_P mechanism doesn't know about pop.  */
-      RTX_FRAME_RELATED_P (insn) = 1;
-      x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, GEN_INT (UNITS_PER_WORD));
-      x = gen_rtx_SET (stack_pointer_rtx, x);
-      add_reg_note (insn, REG_FRAME_RELATED_EXPR, x);
-      m->fs.sp_offset -= UNITS_PER_WORD;
+      rtx x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, GEN_INT (offset));
+      x = gen_rtx_SET (sr->reg, gen_rtx_MEM (word_mode, x));
+      emit_insn (x);
     }
 }
 
@@ -12597,7 +12600,7 @@ release_scratch_register_on_entry (struct scratch_reg 
*sr)
    pushed on the stack.  */
 
 static void
-ix86_adjust_stack_and_probe_stack_clash (const HOST_WIDE_INT size,
+ix86_adjust_stack_and_probe_stack_clash (HOST_WIDE_INT size,
                                         const bool int_registers_saved)
 {
   struct machine_function *m = cfun->machine;
@@ -12713,6 +12716,12 @@ ix86_adjust_stack_and_probe_stack_clash (const 
HOST_WIDE_INT size,
       struct scratch_reg sr;
       get_scratch_register_on_entry (&sr);
 
+      /* If we needed to save a register, then account for any space
+        that was pushed (we are not going to pop the register when
+        we do the restore).  */
+      if (sr.saved)
+       size -= UNITS_PER_WORD;
+
       /* Step 1: round SIZE down to a multiple of the interval.  */
       HOST_WIDE_INT rounded_size = size & -probe_interval;
 
@@ -12761,7 +12770,9 @@ ix86_adjust_stack_and_probe_stack_clash (const 
HOST_WIDE_INT size,
                                   m->fs.cfa_reg == stack_pointer_rtx);
       dump_stack_clash_frame_info (PROBE_LOOP, size != rounded_size);
 
-      release_scratch_register_on_entry (&sr);
+      /* This does not deallocate the space reserved for the scratch
+        register.  That will be deallocated in the epilogue.  */
+      release_scratch_register_on_entry (&sr, size);
     }
 
   /* Make sure nothing is scheduled before we are done.  */
@@ -12774,7 +12785,7 @@ ix86_adjust_stack_and_probe_stack_clash (const 
HOST_WIDE_INT size,
    pushed on the stack.  */
 
 static void
-ix86_adjust_stack_and_probe (const HOST_WIDE_INT size,
+ix86_adjust_stack_and_probe (HOST_WIDE_INT size,
                             const bool int_registers_saved)
 {
   /* We skip the probe for the first interval + a small dope of 4 words and
@@ -12847,6 +12858,11 @@ ix86_adjust_stack_and_probe (const HOST_WIDE_INT size,
 
       get_scratch_register_on_entry (&sr);
 
+      /* If we needed to save a register, then account for any space
+        that was pushed (we are not going to pop the register when
+        we do the restore).  */
+      if (sr.saved)
+       size -= UNITS_PER_WORD;
 
       /* Step 1: round SIZE to the previous multiple of the interval.  */
 
@@ -12906,7 +12922,9 @@ ix86_adjust_stack_and_probe (const HOST_WIDE_INT size,
                                                    (get_probe_interval ()
                                                     + dope))));
 
-      release_scratch_register_on_entry (&sr);
+      /* This does not deallocate the space reserved for the scratch
+        register.  That will be deallocated in the epilogue.  */
+      release_scratch_register_on_entry (&sr, size);
     }
 
   /* Even if the stack pointer isn't the CFA register, we need to correctly
@@ -13015,6 +13033,11 @@ ix86_emit_probe_stack_range (HOST_WIDE_INT first, 
HOST_WIDE_INT size,
 
       get_scratch_register_on_entry (&sr);
 
+      /* If we needed to save a register, then account for any space
+        that was pushed (we are not going to pop the register when
+        we do the restore).  */
+      if (sr.saved)
+       size -= UNITS_PER_WORD;
 
       /* Step 1: round SIZE to the previous multiple of the interval.  */
 
@@ -13055,7 +13078,9 @@ ix86_emit_probe_stack_range (HOST_WIDE_INT first, 
HOST_WIDE_INT size,
                                                       sr.reg),
                                         rounded_size - size));
 
-      release_scratch_register_on_entry (&sr);
+      /* This does not deallocate the space reserved for the scratch
+        register.  That will be deallocated in the epilogue.  */
+      release_scratch_register_on_entry (&sr, size);
     }
 
   /* Make sure nothing is scheduled before we are done.  */
diff --git a/gcc/testsuite/gcc.target/i386/pr84128.c 
b/gcc/testsuite/gcc.target/i386/pr84128.c
new file mode 100644
index 0000000..a8323fd6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr84128.c
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -march=i686 -mtune=generic -fstack-clash-protection" } */
+/* { dg-require-effective-target ia32 } */
+
+__attribute__ ((noinline, noclone, weak, regparm (3)))
+int
+f1 (long arg0, int (*pf) (long, void *))
+{
+  unsigned char buf[32768];
+  return pf (arg0, buf);
+}
+
+__attribute__ ((noinline, noclone, weak))
+int
+f2 (long arg0, void *ignored)
+{
+  if (arg0 != 17)
+    __builtin_abort ();
+  return 19;
+}
+
+int
+main (void)
+{
+  if (f1 (17, f2) != 19)
+    __builtin_abort ();
+  return 0;
+}
+
+

Reply via email to