https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111019
Bug ID: 111019 Summary: Optimizer incorrectly assumes variable is not changed while change happens through another pointer Product: gcc Version: 12.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: boskidialer at gmail dot com Target Milestone: --- Created attachment 55737 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55737&action=edit Smallest reproduction i managed to create Hello, I was investigating one of the tests failures in the product, test failure that only happens while compiling with -O3 or -O2, but one that does not happen with -O1 or when not using any optimization. GCC Version: dashboard@dashboard-desktop:~$ /usr/bin/g++ -v Using built-in specs. COLLECT_GCC=/usr/bin/g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 12.3.0-1ubuntu1~23.04' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-12 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-12-DAPbBt/gcc-12-12.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-DAPbBt/gcc-12-12.3.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~23.04) Reproduction: /usr/bin/g++ gcc-err.cpp -O3 -o gcc-err.out && ./gcc-err.out (gcc-err.cpp is provided as the attachment to the bug report). Issue is that generated output freezes when compiled it under -O3 or -O2 but not when compiling under -O1 or without any optimizations. Just in case to verify the issue is not on my end, i pasted the reproduction code and required compiler flags onto a godbolt: https://godbolt.org/z/Ez7vrz77W - and it shows that the compiled program times out. This is a confirmation that the generated output is stuck. After changing the compiler options on the right side on the godbolt site to -O1, the code compiles as well but the executable now correctly finishes within time limit and outputs a single line "test". Based on the debugging i did on this code, it looks to be related to the Target::~Target code where there is the `whlie (this->next)` loop where i suspect compiler or optimizer incorrectly assumes that value of `this->next` is unchanged between iterations however that is not true because in this case there is `n` variable set to `this->next` which points to a second item in the double linked list, which means `n->previous == this` and as such `n->previous->next = ...` line is effectively changing value of the `this->next`, but indirectly. When generating the assembly from the given reproduction using `/usr/bin/g++ -masm=intel gcc-err.cpp -O3 -S -o gcc-err.S`, instructions produced seem to be incorrect as they are missing the repeated checks if the value of `this->next` was changed in the next iteration: .L21: mov rcx, QWORD PTR [rax] mov rdx, QWORD PTR 8[rax] test rcx, rcx je .L19 // if (n->previous) mov QWORD PTR 8[rcx], rdx mov rdx, QWORD PTR 8[rax] // n->previous->next = n->next; .L19: test rdx, rdx je .L20 // if (n->next) mov QWORD PTR [rdx], rcx // n->next->previous = n->previous; .L20: xor edx, edx movups XMMWORD PTR [rax], xmm0 mov QWORD PTR 16[rax], rdx jmp .L21 When any external function calls, barrier instructions (like 'asm volatile("":::"memory")') or more complex code is added, the loop produces the correct code: .L18: mov rax, QWORD PTR 8[rbx] test rax, rax je .L74 // quits the loop if `this->next == nullptr` mov rcx, QWORD PTR [rax] mov rdx, QWORD PTR 8[rax] test rcx, rcx je .L19 // if (n->previous) mov QWORD PTR 8[rcx], rdx mov rdx, QWORD PTR 8[rax] // n->previous->next = n->next; .L19: test rdx, rdx je .L20 // if (n->next) mov QWORD PTR [rdx], rcx // n->next->previous = n->previous; .L20: xor edx, edx movups XMMWORD PTR [rax], xmm0 mov QWORD PTR 16[rax], rdx