https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120284
Bug ID: 120284 Summary: inline assembly operand constraint not comply with document Product: gcc Version: 14.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: huiba....@alibaba-inc.com Target Milestone: --- I'm writing a benchmark program, and I need to let gcc assume a variable changes in each loop iteration, so as to avoid some optimizations and make it more similar to real life scenario. I write it like this: ``` void test_foo(Obj* obj, const uint32_t* x) { for (int i = 0; I < N; ++i) { asm volatile("" : "=r"(x) : "r"(x)); uint32_t result = obj->foo(x); asm volatile("" : : "r"(result)); } } ``` I come across segment fault with some implementation of foo() with -O3 on x86_64, using gcc 13.3.0 and 14.2.0. The issue doesn't exist with -O2 or clang 18.1.3. I disassemble the test_foo() function that has foo() inlined: ``` <+0>: endbr64 <+4>: pushq %rbp <+5>: movq %rsp, %rbp <+8>: pushq %r15 <+10>: pushq %r14 <+12>: pushq %r13 <+14>: pushq %r12 <+16>: pushq %rbx <+17>: movq %rdi, %r12 <+20>: movq %rsi, %r15 <+23>: andq $-0x40, %rsp <+27>: subq $0x8, %rsp <+31>: movl $0x5f5e100, -0x48(%rsp) ; imm = 0x5F5E100 <+39>: vpmovsxbd 0x56a940(%rdi), %zmm1 <+49>: vmovdqa32 (%rdi), %zmm0 <+55>: nopw (%rax,%rax) <+64>: movq %rax, %r15 # chang it to movq %r15, %rax <+67>: vpbroadcastd (%rax), %zmm17 <+73>: vpbroadcastd 0x4(%rax), %zmm16 <+80>: vpbroadcastd 0x8(%rax), %zmm15 <+87>: vpbroadcastd 0xc(%rax), %zmm14 <+94>: vpbroadcastd 0x10(%rax), %zmm13 <+101>: vpbroadcastd 0x14(%rax), %zmm12 <+108>: vpbroadcastd 0x18(%rax), %zmm11 <+115>: vpbroadcastd 0x1c(%rax), %zmm10 <+122>: vpbroadcastd 0x20(%rax), %zmm9 <+129>: vpbroadcastd 0x24(%r15), %zmm8 <+136>: vpbroadcastd 0x28(%r15), %zmm7 <+143>: vpbroadcastd 0x2c(%r15), %zmm6 ... <+4464>: decl -0x48(%rsp) <+4468>: jne 0x1870 ; <+64> <+4474>: vzeroupper <+4477>: leaq -0x28(%rbp), %rsp <+4481>: popq %rbx <+4482>: popq %r12 <+4484>: popq %r13 <+4486>: popq %r14 <+4488>: popq %r15 <+4490>: popq %rbp <+4491>: retq ``` I find that gcc is probably misusing a pair of registers in one instruction located at <+64>. I manually exchange the 2 registers, and the program seems to run correctly. If I change the 1st asm statement to ```asm volatile("" : "+r"(x));``` by using "+r" constraint instead of "=r" and "r", the program also runs correctly. I think these 2 forms are identical, as denoted in the document: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Output-Operands "When using ‘=’, do not assume the location contains the existing value on entry to the asm, except when the operand is tied to an input;". So there seems to be a bug in the frontend that doesn't forward the constraints correctly to the backend. The command line is "g++-14 x.cpp -O3 -march=native", I'm using ubuntu 24.04 on x86_64 (AMD EPYC 9T24), and the compilers are installed with apt. BTW, I also find some sub-optimal coding in the assembly: (1) it seems unnecessary for <+64> to move from %r15 to %rax, as we can just use %r15 in the following lines; (and why it uses both of them?) (2) it seems unnecessary for <+20> to move from %rsi to %r15, as we can just use %rsi in the following lines;