On 3/3/2026 1:40 AM, Jakub Jelinek wrote:
Hi!

This PR is about an inconsistency between AT&T and Intel syntax
for output_adjust_stack_and_probe/output_probe_stack_range.
On ia32 they use both orl or or BYTE PTR, i.e. 32-bit or,
but on x86_64 in AT&T syntax they use orq (i.e. 64-bit or) and
in Intel syntax they use or DWORD PTR (i.e. 32-bit or).
These cases are used when probing stack in a loop, for each
page one probe.  There is also the probe_stack named pattern
which currently uses word_mode or (i.e. 64-bit or for x86_64)
for both syntaxes, used when probing only once.

Functionally, I think whether we do an 8-bit or 32-bit or 64-bit
or with 0 constant doesn't matter, we don't modify any values on the
stack, just pretend to modify it.  The 8-bit and 32-bit ors
are 1-byte shorter though than 64-bit one.  How the 3 behave
performance-wise is unknown, if the particular probed spot on the
stack hasn't been stored/read for a while and won't be for a while,
then I'd think it shouldn't matter, dunno if there can be store
forwarding effects if it has been e.g. written or read very recently
by some other function as say 32-bit access and now is 8-bit.  The
access after the probe (if it happens soon enough) should be in valid
programs a store (and again, dunno if there can be issues if the
sizes are different).
I wouldn't worry about STLF here since in a conforming program there should be an additional store to the same memory location that initializes the memory location for the user's code.  I'm not going to stress about the performance of an uninitialized memory read when such things exist :-)


Now, for consistency reasons, we could just make the Intel
syntax match the AT&T and use 64-bit or on x86_64, so
use QWORD PTR instead of DWORD PTR if stack_pointer_rtx is 64-bit
in those 2 functions and be done with it.

Another possibility is use always 32-bit ors (in both those 2 functions
and probe_stack*; similar to the posted patch except testsuite changes
aren't needed and s/{b}/{l}/g;s/QI/SI/g;s/BYTE PTR/DWORD PTR/g) and
last option is to always use 8-bit ors (which is what the following
patch does).  Or some other mix, say use 32-bit ors for -Os/-Oz and
64-bit ors otherwise.

Bootstrapped/regtested on x86_64-linux and i686-linux.

2026-03-03  Jakub Jelinek  <[email protected]>

        PR target/124336
        * config/i386/i386.cc (output_adjust_stack_and_probe): Use
        or{b} rather than or%z0 and BYTE PTR rather than DWORD PTR.
        (output_probe_stack_range): Likewise.
        * config/i386/i386.md (probe_stack): Pass just 2 arguments
        to gen_probe_stack_1, first adjust_address to QImode, second
        const0_rtx.
        (@probe_stack_1_<mode>): Remove.
        (probe_stack_1): New define_insn.
OK.  As has been noted elsewhere in this thread, a testb may be the best choice for stack clash protection, but perhaps not for Ada's stack check.  So this seems like a nice safe fix from my point of view.  We can always follow-up with using testb or similar improvements in gcc-17.


Jeff

Reply via email to