Hello! After some more thinking and considering all recent discussion (thanks!), I am convinced that a slightly simplified original patch (attached), now one-liner, is the way to go.
Let's look at the following test: --cut here-- unsigned long foo (void) { return __builtin_ia32_readeflags_u64 (); } --cut here-- When the above builtin is present in the function, the function doesn't use redzone. As discussed in the original submission, x86 creates redzone based on crtl->sp_is_unchanging, cleared flag cancels rezone creation. The flag is cleared in stack-ptr-mod.cc, notice_stack_pointer_modification_1 when a) stack pointer is explicitly mentioned or b) when RTX_AUTOINC with SP is detected *in the output* of the RTX. The final RTX dump of the above function reads as: (insn:TI 5 2 6 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8]) (unspec:DI [ (reg:CC 17 flags) ] UNSPEC_PUSHFL)) "unchan.c":6:10 80 {pushfldi2} (expr_list:REG_DEAD (reg:CC 17 flags) (nil))) (insn 6 5 12 2 (set (reg:DI 0 ax [99]) (mem:DI (post_inc:DI (reg/f:DI 7 sp)) [0 S8 A8])) "unchan.c":6:10 76 {*popdi1} (nil)) and notice_stack_pointer_modification_1 triggers on the first instruction. Now consider the following test: --cut here-- register unsigned long current_stack_pointer asm ("%rsp"); #define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer) unsigned long bar (void) { unsigned long res; asm ("pushfq" : ASM_CALL_CONSTRAINT); asm ("popq %0" : "=r" (res), ASM_CALL_CONSTRAINT); return res; } --cut here-- The above function results in exactly the same asm as the one in the first test. However, notice_stack_pointer_modification_1 does not detect stack pointer access here, but it should. Here we *do* change SP, so IMHO this case addresses Andreas' remark that we shouldn't say we changed SP if we don't do it. Please note that due to inout constraint on %rsp, the scheduler won't reschedule insns even without "volatile" asm qualifier. Going a bit further, we can join the asm to form a new test: --cut here-- register unsigned long current_stack_pointer asm ("%rsp"); #define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer) unsigned long baz (void) { unsigned long res; asm ("pushfq; popq %0" : "=r" (res), ASM_CALL_CONSTRAINT); return res; } --cut here-- here we inform the compiler that we read and write the rsp, not necessarily change it, maybe something like "add %rsp, $0". As above, notice_stack_pointer_modification_1 does not trigger here. When we consider the very first testcase, notice_stack_pointer_modification_1 is interested only in PUSH type instruction. It doesn't concern POP instruction, where SP gets modified, too. The approach and logic of *stack_ptr_mod pass in fact only looks if SP is mentioned in the output of the RTX, not what happens with it, and records its detection by clearing crtl->sp_is_unchanging. This brings us to the logic of my now one-liner patch. There is no difference in the assembly for any of the above tests, but in the later two tests the compiler does not detect that RSP is changed, although the access is correctly modeled with inout constraints (so SP hard register is mentioned in the output). There is a condition in the notice_stack_pointer_modification_1 that tries to detect this case, but it assumes that only one SP is defined and uses pointer comparison. Pointer comparison does not detect that "(reg/v:DI 7 sp [ current_stack_pointer ])" and "(reg/f:DI 7 sp)" RTXes actually all correspond to the same stack pointer register. The patch fixes this wrong assumption. The change is localized specifically to *stack_ptr_mod pass and doesn't affect anything else, but clears crtl->sp_is_unchanging for RTXes where global register "%rsp" is mentioned in the output, including: --cut here-- register unsigned long current_stack_pointer asm ("%rsp"); #define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer) void test (void) { current_stack_pointer += 1; } --cut here which is currently not detected as the function where SP is changing, even though we have: (insn:TI 6 5 16 2 (parallel [ (set (reg/v:DI 7 sp [ current_stack_pointer ]) (plus:DI (reg/v:DI 7 sp [ current_stack_pointer ]) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) "unchan.c":6:25 284 {*adddi_1} Also, Jeff asked about validity of the register unsigned long current_stack_pointer asm ("%rsp"); #define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer) If we want to confine e.g.: --cut here-- register unsigned long current_stack_pointer asm ("%rsp"); #define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer) void c (void); void test (void) { asm ("call %c1" : ASM_CALL_CONSTRAINT : "i"(c)); } --cut here-- between frame setup and teardown (when compiled with -fno-omit-frame-pointer): 0000000000000010 <test>: 10: 55 push %rbp 11: 48 89 e5 mov %rsp,%rbp 14: e8 00 00 00 00 call 19 <test+0x9> 19: c9 leave 1a: c3 ret then we have to depend the asm on %rsp to avoid asm from being moved above insn @11 (output dependence) or below insn @19 ( == mov %rbp,%rsp, input dependence). There is no other way to prevent scheduler from moving the asm, we can't declare %rbp as global register when -fno-omit-frame-pointer is in effect. Based on the above discussion I propose a simplified patch (attached) that effectively prevents redzone creation when ASM_CALL_CONSTRAINT is used. I have tested it with gcc bootstrap and regtest, and since linux is a heavy user of the ASM_CALL_CONSTRAINT construct, also by compiling and booting the latest mainline kernel (Linux version 6.12.0-rc6-00004-g67f14dce9061). I have no means to test with FRED enabled kernel, but with the proposed change at least correctly tagged asm that changes SP won't be problematic when the kernel will be compiled with -mred-zone. Uros.
diff --git a/gcc/stack-ptr-mod.cc b/gcc/stack-ptr-mod.cc index ba38de54bb1..a522a67c0b4 100644 --- a/gcc/stack-ptr-mod.cc +++ b/gcc/stack-ptr-mod.cc @@ -34,7 +34,7 @@ static void notice_stack_pointer_modification_1 (rtx x, const_rtx pat ATTRIBUTE_UNUSED, void *data ATTRIBUTE_UNUSED) { - if (x == stack_pointer_rtx + if (rtx_equal_p (x, stack_pointer_rtx) /* The stack pointer is only modified indirectly as the result of a push until later. See the comments in rtl.texi regarding Embedded Side-Effects on Addresses. */