Hello!

After some more thinking and considering all recent discussion
(thanks!), I am convinced that a slightly simplified original patch
(attached), now one-liner, is the way to go.

Let's look at the following test:

--cut here--
unsigned long foo (void)
{
  return __builtin_ia32_readeflags_u64 ();
}
--cut here--

When the above builtin is present in the function, the function
doesn't use redzone. As discussed in the original submission, x86
creates redzone based on crtl->sp_is_unchanging, cleared flag cancels
rezone creation. The flag is cleared in stack-ptr-mod.cc,
notice_stack_pointer_modification_1 when a) stack pointer is
explicitly mentioned or b) when RTX_AUTOINC with SP is detected *in
the output* of the RTX. The final RTX dump of the above function reads
as:

(insn:TI 5 2 6 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0  S8 A8])
        (unspec:DI [
                (reg:CC 17 flags)
            ] UNSPEC_PUSHFL)) "unchan.c":6:10 80 {pushfldi2}
     (expr_list:REG_DEAD (reg:CC 17 flags)
        (nil)))
(insn 6 5 12 2 (set (reg:DI 0 ax [99])
        (mem:DI (post_inc:DI (reg/f:DI 7 sp)) [0  S8 A8]))
"unchan.c":6:10 76 {*popdi1}
     (nil))

and notice_stack_pointer_modification_1 triggers on the first instruction.

Now consider the following test:

--cut here--
register unsigned long current_stack_pointer asm ("%rsp");
#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)

unsigned long bar (void)
{
  unsigned long res;

  asm ("pushfq" : ASM_CALL_CONSTRAINT);
  asm ("popq %0" : "=r" (res), ASM_CALL_CONSTRAINT);
  return res;
}
--cut here--

The above function results in exactly the same asm as the one in the
first test. However, notice_stack_pointer_modification_1 does not
detect stack pointer access here, but it should. Here we *do* change
SP, so IMHO this case addresses Andreas' remark that we shouldn't say
we changed SP if we don't do it. Please note that due to inout
constraint on %rsp, the scheduler won't reschedule insns even without
"volatile" asm qualifier.

Going a bit further, we can join the asm to form a new test:

--cut here--
register unsigned long current_stack_pointer asm ("%rsp");
#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)

unsigned long baz (void)
{
  unsigned long res;

  asm ("pushfq; popq %0" : "=r" (res), ASM_CALL_CONSTRAINT);
  return res;
}
--cut here--

here we inform the compiler that we read and write the rsp, not
necessarily change it, maybe something like "add %rsp, $0". As above,
notice_stack_pointer_modification_1 does not trigger here.

When we consider the very first testcase,
notice_stack_pointer_modification_1 is interested only in PUSH type
instruction. It doesn't concern POP instruction, where  SP gets
modified, too. The approach and logic of *stack_ptr_mod pass in fact
only looks if SP is mentioned in the output of the RTX, not what
happens with it, and records its detection by clearing
crtl->sp_is_unchanging.

This brings us to the logic of my now one-liner patch. There is no
difference in the assembly for any of the above tests, but in the
later two tests the compiler does not detect that RSP is changed,
although the access  is correctly modeled with inout constraints (so
SP hard register is mentioned in the output). There is a condition in
the notice_stack_pointer_modification_1 that tries to detect this
case, but it assumes that only one SP is defined and uses pointer
comparison. Pointer comparison does not detect that "(reg/v:DI 7 sp [
current_stack_pointer ])" and "(reg/f:DI 7 sp)" RTXes actually all
correspond to the same stack pointer register.

The patch fixes this wrong assumption. The change is localized
specifically to *stack_ptr_mod pass and doesn't affect anything else,
but clears crtl->sp_is_unchanging for RTXes where global register
"%rsp" is mentioned in the output, including:

--cut here--
register unsigned long current_stack_pointer asm ("%rsp");
#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)

void test (void)
{
  current_stack_pointer += 1;
}
--cut here

which is currently not detected as the function where SP is changing,
even though we have:

(insn:TI 6 5 16 2 (parallel [
            (set (reg/v:DI 7 sp [ current_stack_pointer ])
                (plus:DI (reg/v:DI 7 sp [ current_stack_pointer ])
                    (const_int 1 [0x1])))
            (clobber (reg:CC 17 flags))
        ]) "unchan.c":6:25 284 {*adddi_1}

Also, Jeff asked about validity of the

register unsigned long current_stack_pointer asm ("%rsp");
#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)

If we want to confine e.g.:

--cut here--
register unsigned long current_stack_pointer asm ("%rsp");
#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)

void c (void);

void test (void)
{
  asm ("call %c1" : ASM_CALL_CONSTRAINT : "i"(c));
}
--cut here--

between frame setup and teardown (when compiled with -fno-omit-frame-pointer):

0000000000000010 <test>:
 10:   55                      push   %rbp
 11:   48 89 e5                mov    %rsp,%rbp
 14:   e8 00 00 00 00          call   19 <test+0x9>
 19:   c9                      leave
 1a:   c3                      ret

then we have to depend the asm on %rsp to avoid asm from being moved
above insn @11 (output dependence) or below insn @19 ( == mov
%rbp,%rsp, input dependence). There is no other way to prevent
scheduler from moving the asm, we can't declare %rbp as global
register when -fno-omit-frame-pointer is in effect.

Based on the above discussion I propose a simplified patch (attached)
that effectively prevents redzone creation when ASM_CALL_CONSTRAINT is
used. I have tested it with gcc bootstrap and regtest, and since linux
is a heavy user of the ASM_CALL_CONSTRAINT construct, also by
compiling and booting the latest mainline kernel (Linux version
6.12.0-rc6-00004-g67f14dce9061). I have no means to test with FRED
enabled kernel, but with the proposed change at least correctly tagged
asm that changes SP won't be problematic when the kernel will be
compiled with -mred-zone.

Uros.
diff --git a/gcc/stack-ptr-mod.cc b/gcc/stack-ptr-mod.cc
index ba38de54bb1..a522a67c0b4 100644
--- a/gcc/stack-ptr-mod.cc
+++ b/gcc/stack-ptr-mod.cc
@@ -34,7 +34,7 @@ static void
 notice_stack_pointer_modification_1 (rtx x, const_rtx pat ATTRIBUTE_UNUSED,
                                     void *data ATTRIBUTE_UNUSED)
 {
-  if (x == stack_pointer_rtx
+  if (rtx_equal_p (x, stack_pointer_rtx)
       /* The stack pointer is only modified indirectly as the result
         of a push until later.  See the comments in rtl.texi
         regarding Embedded Side-Effects on Addresses.  */

Reply via email to