https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111019

            Bug ID: 111019
           Summary: Optimizer incorrectly assumes variable is not changed
                    while change happens through another pointer
           Product: gcc
           Version: 12.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: boskidialer at gmail dot com
  Target Milestone: ---

Created attachment 55737
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55737&action=edit
Smallest reproduction i managed to create

Hello,

I was investigating one of the tests failures in the product, test failure that
only happens while compiling with -O3 or -O2, but one that does not happen with
-O1 or when not using any optimization.

GCC Version:

dashboard@dashboard-desktop:~$ /usr/bin/g++ -v
Using built-in specs.
COLLECT_GCC=/usr/bin/g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
12.3.0-1ubuntu1~23.04' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-12
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-vtable-verify --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-12-DAPbBt/gcc-12-12.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-DAPbBt/gcc-12-12.3.0/debian/tmp-gcn/usr
--enable-offload-defaulted --without-cuda-driver --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~23.04)

Reproduction:

/usr/bin/g++ gcc-err.cpp -O3 -o gcc-err.out && ./gcc-err.out

(gcc-err.cpp is provided as the attachment to the bug report).

Issue is that generated output freezes when compiled it under -O3 or -O2 but
not when compiling under -O1 or without any optimizations.

Just in case to verify the issue is not on my end, i pasted the reproduction
code and required compiler flags onto a godbolt:
https://godbolt.org/z/Ez7vrz77W - and it shows that the compiled program times
out. This is a confirmation that the generated output is stuck. After changing
the compiler options on the right side on the godbolt site to -O1, the code
compiles as well but the executable now correctly finishes within time limit
and outputs a single line "test".

Based on the debugging i did on this code, it looks to be related to the
Target::~Target code where there is the `whlie (this->next)` loop where i
suspect compiler or optimizer incorrectly assumes that value of `this->next` is
unchanged between iterations however that is not true because in this case
there is `n` variable set to `this->next` which points to a second item in the
double linked list, which means `n->previous == this` and as such
`n->previous->next = ...` line is effectively changing value of the
`this->next`, but indirectly.

When generating the assembly from the given reproduction using `/usr/bin/g++
-masm=intel gcc-err.cpp -O3 -S -o gcc-err.S`, instructions produced seem to be
incorrect as they are missing the repeated checks if the value of `this->next`
was changed in the next iteration:

.L21:
        mov     rcx, QWORD PTR [rax]
        mov     rdx, QWORD PTR 8[rax]
        test    rcx, rcx
        je      .L19                   // if (n->previous)
        mov     QWORD PTR 8[rcx], rdx
        mov     rdx, QWORD PTR 8[rax]  //   n->previous->next = n->next;
.L19:
        test    rdx, rdx
        je      .L20                   // if (n->next)
        mov     QWORD PTR [rdx], rcx   //   n->next->previous = n->previous;
.L20:
        xor     edx, edx
        movups  XMMWORD PTR [rax], xmm0
        mov     QWORD PTR 16[rax], rdx
        jmp     .L21

When any external function calls, barrier instructions (like 'asm
volatile("":::"memory")') or more complex code is added, the loop produces the
correct code:

.L18:
        mov     rax, QWORD PTR 8[rbx]
        test    rax, rax
        je      .L74                   // quits the loop if `this->next ==
nullptr`
        mov     rcx, QWORD PTR [rax]
        mov     rdx, QWORD PTR 8[rax]
        test    rcx, rcx
        je      .L19                   // if (n->previous)
        mov     QWORD PTR 8[rcx], rdx
        mov     rdx, QWORD PTR 8[rax]  //   n->previous->next = n->next;
.L19:
        test    rdx, rdx
        je      .L20                   // if (n->next)
        mov     QWORD PTR [rdx], rcx   //   n->next->previous = n->previous;
.L20:
        xor     edx, edx
        movups  XMMWORD PTR [rax], xmm0
        mov     QWORD PTR 16[rax], rdx

Reply via email to