https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120284

            Bug ID: 120284
           Summary: inline assembly operand constraint not comply with
                    document
           Product: gcc
           Version: 14.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: huiba....@alibaba-inc.com
  Target Milestone: ---

I'm writing a benchmark program, and I need to let gcc assume a variable
changes in each loop iteration, so as to avoid some optimizations and make it
more similar to real life scenario. I write it like this:

```
void test_foo(Obj* obj, const uint32_t* x) {
  for (int i = 0; I < N; ++i) {
    asm volatile("" : "=r"(x) : "r"(x)); 
    uint32_t result = obj->foo(x);
    asm volatile("" : : "r"(result));
  }
}
```

I come across segment fault with some implementation of foo() with -O3 on
x86_64, using gcc 13.3.0 and 14.2.0. The issue doesn't exist with -O2 or clang
18.1.3. I disassemble the test_foo() function that has foo() inlined:

```
<+0>:    endbr64
<+4>:    pushq  %rbp
<+5>:    movq   %rsp, %rbp
<+8>:    pushq  %r15
<+10>:   pushq  %r14
<+12>:   pushq  %r13
<+14>:   pushq  %r12
<+16>:   pushq  %rbx
<+17>:   movq   %rdi, %r12
<+20>:   movq   %rsi, %r15
<+23>:   andq   $-0x40, %rsp
<+27>:   subq   $0x8, %rsp
<+31>:   movl   $0x5f5e100, -0x48(%rsp) ; imm = 0x5F5E100
<+39>:   vpmovsxbd 0x56a940(%rdi), %zmm1
<+49>:   vmovdqa32 (%rdi), %zmm0
<+55>:   nopw   (%rax,%rax)

<+64>:   movq   %rax, %r15      # chang it to movq   %r15, %rax

<+67>:   vpbroadcastd (%rax), %zmm17
<+73>:   vpbroadcastd 0x4(%rax), %zmm16
<+80>:   vpbroadcastd 0x8(%rax), %zmm15
<+87>:   vpbroadcastd 0xc(%rax), %zmm14
<+94>:   vpbroadcastd 0x10(%rax), %zmm13
<+101>:  vpbroadcastd 0x14(%rax), %zmm12
<+108>:  vpbroadcastd 0x18(%rax), %zmm11
<+115>:  vpbroadcastd 0x1c(%rax), %zmm10
<+122>:  vpbroadcastd 0x20(%rax), %zmm9
<+129>:  vpbroadcastd 0x24(%r15), %zmm8
<+136>:  vpbroadcastd 0x28(%r15), %zmm7
<+143>:  vpbroadcastd 0x2c(%r15), %zmm6

...

<+4464>: decl   -0x48(%rsp)
<+4468>: jne    0x1870         ; <+64>
<+4474>: vzeroupper
<+4477>: leaq   -0x28(%rbp), %rsp
<+4481>: popq   %rbx
<+4482>: popq   %r12
<+4484>: popq   %r13
<+4486>: popq   %r14
<+4488>: popq   %r15
<+4490>: popq   %rbp
<+4491>: retq
```

I find that gcc is probably misusing a pair of registers in one instruction
located at <+64>. I manually exchange the 2 registers, and the program seems to
run correctly. If I change the 1st asm statement to ```asm volatile("" :
"+r"(x));``` by using "+r" constraint instead of "=r" and "r", the program also
runs correctly. I think these 2 forms are identical, as denoted in the
document:

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Output-Operands

"When using ‘=’, do not assume the location contains the existing value on
entry to the asm, except when the operand is tied to an input;".

So there seems to be a bug in the frontend that doesn't forward the constraints
correctly to the backend. The command line is "g++-14 x.cpp -O3 -march=native",
I'm using ubuntu 24.04 on x86_64 (AMD EPYC 9T24), and the compilers are
installed with apt.


BTW, I also find some sub-optimal coding in the assembly: 

(1) it seems unnecessary for <+64> to move from %r15 to %rax, as we can just
use %r15 in the following lines; (and why it uses both of them?)

(2) it seems unnecessary for <+20> to move from %rsi to %r15, as we can just
use %rsi in the following lines;

Reply via email to