https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111901
Bug ID: 111901 Summary: Apparently bogus CSE of inline asm with memory clobber Product: gcc Version: 13.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: torva...@linux-foundation.org Target Milestone: --- This came up during an unrelated discussion about 'volatile' in the kernel with Uros Bizjak, both wrt inline asms and just 'normal' volatile memory accesses. As part of that discussion I wrote a test-program, and it shows unexpected (arguably very buggy) gcc optimization behavior. The test is really simple: int test(void) { unsigned int sum=0; for (int i=0 ; i < 4 ; i++){ unsigned int val; #if ONE asm("magic1 %0":"=r"(val): :"memory"); #else asm volatile("magic2 %0":"=r"(val)); #endif sum+=val; } return sum; } and if you compile this with gcc -O2 -S -DONE -funroll-all-loops t.c the end result seems nonsensical. The over-eager CSE combines the non-volatile asm despite the fact that it has a memory clobber, which gcc documentation states means: The "memory" clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters). so combining the four identical asm statements into one seems to be actively buggy. The inline asm may not be marked volatile, but it does clearly tell the compiler that it does memory reads OR WRITES to operands other than those listed. Which would on the face of it make the CSE invalid. Imagine, for example, that there's some added statistics gathering or similar going on inside the asm, maybe using the new Intel remote atomics for example. The other oddity is how it does not actually multiply the result by four in any sane way. The above generates: magic1 %eax movl %eax, %edx addl %eax, %eax addl %edx, %eax addl %edx, %eax ret but *without* the memory barrier it generates the much more obvious magic1 %eax sall $2, %eax ret so the memory barrier does end up affecting the end result, just in a nonsensical way. And no, I don't think we have cases of this kind of code in the kernel. Marking the asm volatile will obviously disable the over-eager CSE, and is certainly the natural thing to do for anything that actually modifies memory. But the memory clobber really does seem to make the CSE that gcc does invalid as per the documentation, and Uros suggested I'd do a gcc bugzilla entry for this. FYI: clang seems to generate the expected code for all cases (ie both "volatile" and the memory clobber will disable CSE, and when it does CSE it, it generates the expected single shift, rather than doing individual additions). But clang verifies the assembler mnemonic (which I think is a misfeature - one of the *reasons* to use inline asm is to do things the compiler doesn't understand), so instead of "magicX", you need to use "strl" or something like that.