[Bug c/111901] New: Apparently bogus CSE of inline asm with memory clobber

torvalds--- via Gcc-bugs Fri, 20 Oct 2023 12:06:02 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111901


            Bug ID: 111901
           Summary: Apparently bogus CSE of inline asm with memory clobber
           Product: gcc
           Version: 13.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: torva...@linux-foundation.org
  Target Milestone: ---

This came up during an unrelated discussion about 'volatile' in the kernel with
Uros Bizjak, both wrt inline asms and just 'normal' volatile memory accesses.
As part of that discussion I wrote a test-program, and it shows unexpected
(arguably very buggy) gcc optimization behavior.

The test is really simple:

    int test(void)
    {
        unsigned int sum=0;
        for (int i=0 ; i < 4 ; i++){
                unsigned int val;
    #if ONE
                asm("magic1 %0":"=r"(val): :"memory");
    #else
                asm volatile("magic2 %0":"=r"(val));
    #endif
                sum+=val;
        }
        return sum;
    }

and if you compile this with

    gcc -O2 -S -DONE -funroll-all-loops t.c

the end result seems nonsensical.

The over-eager CSE combines the non-volatile asm despite the fact that it has a
memory clobber, which gcc documentation states means:

     The "memory" clobber tells the compiler that the assembly code
     performs memory reads or writes to items other than those listed in
     the input and output operands (for example, accessing the memory
     pointed to by one of the input parameters).

so combining the four identical asm statements into one seems to be actively
buggy. The inline asm may not be marked volatile, but it does clearly tell the
compiler that it does memory reads OR WRITES to operands other than those
listed. Which would on the face of it make the CSE invalid.

Imagine, for example, that there's some added statistics gathering or similar
going on inside the asm, maybe using the new Intel remote atomics for example. 

The other oddity is how it does not actually multiply the result by four in any
sane way. The above generates:

        magic1 %eax
        movl    %eax, %edx
        addl    %eax, %eax
        addl    %edx, %eax
        addl    %edx, %eax
        ret

but *without* the memory barrier it generates the much more obvious

        magic1 %eax
        sall    $2, %eax
        ret

so the memory barrier does end up affecting the end result, just in a
nonsensical way.

And no, I don't think we have cases of this kind of code in the kernel. Marking
the asm volatile will obviously disable the over-eager CSE, and is certainly
the natural thing to do for anything that actually modifies memory.

But the memory clobber really does seem to make the CSE that gcc does invalid
as per the documentation, and Uros suggested I'd do a gcc bugzilla entry for
this.

FYI: clang seems to generate the expected code for all cases (ie both
"volatile" and the memory clobber will disable CSE, and when it does CSE it, it
generates the expected single shift, rather than doing individual additions).
But clang verifies the assembler mnemonic (which I think is a misfeature - one
of the *reasons* to use inline asm is to do things the compiler doesn't
understand), so instead of "magicX", you need to use "strl" or something like
that.

[Bug c/111901] New: Apparently bogus CSE of inline asm with memory clobber

Reply via email to