https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113025
Bug ID: 113025 Summary: Pointer is sometimes assumed to be 16-byte aligned even when there is no such guarantee Product: gcc Version: 8.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: juki at gcc dot mail.kapsi.fi Target Milestone: --- The following code is miscompiled in some cases when optimization levels is -O3: // from https://github.com/intel/ARM_NEON_2_x86_SSE/blob/master/NEON_2_SSE.h #define LOAD_SI128(ptr) \ ( ((uintptr_t)(ptr) & 15) == 0 ) ? _mm_load_si128((__m128i*)(ptr)) : _mm_loadu_si128((__m128i*)(ptr)) This macro is used by several different operations in the linked header file that simulate ARM NEON intrinsics that load 128-bit long integer vector from unaligned memory addresses. With low optimization levels and most of the time anyway, function works as expected: - If pointer to the memory location is 16-byte aligned and compiler knows this, it generates opcode "movdqa" matching __mm_load_si128() intrinsic. - If pointer has unknown or non-16-byte alignment, opcode "movdqu" matching _mm_loadu_si128() intrinsic is generated and actual alignment test is optimized away as unnecessary. However, in some cases when macro is used to load 1 or 2 byte aligned data, 16-byte aligned opcode is generated instead and General Protection Fault happens due to invalid alignment. Function where this happens just gets a raw pointer, for example, const uint8_t *as an input and compiler should have no reason to assume that it would be 16-byte aligned all the time. Issue was first detected with gcc 8.4.0 but it was also verified to happen with gcc 9.4.0 and gcc 12.2.0 in different places depending on the version. gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/8/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 8.4.0-1ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-8 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 8.4.0 (Ubuntu 8.4.0-1ubuntu1~18.04) I was unable to generate a simple example when this happens and complex examples are from proprietary code. Hopefully this still helps someone to understand the issue better. What I do know at the moment: - Bug happens at least with C++ frontend when compiling for x86_64 - Bug happens with and without LTO - Bug has only happened with -O3, never with -O2 or -O1 - Bug seems to only happen in very specific cases but it is common enough to crop up in several very different algorithms that use this same operation above. - Minor changes, like changing inline keyword for a related function or changing -DNDEBUG from commandline to another setting, has a potential to "fix" the issue momentarily for that particular location. - Only the first access in the generated function with offset 0 to that pointer is wrong. Later accesses with some variable offset added to that pointer again use unaligned access like they should. - 16-byte aligned access was assumed even when the parent function was looping through different offsets with steps of 1 and calling function with miscompiled code in the same translation unit. So context has given no reason to assume 16-byte alignment for the pointer. - All tested compilers from 8.4.0 to 12.2.0 were producing the same error with the same compiler parameters but errors were not necessarily generated in the same functions. No GCC version from the tested set was found to produce only working code with full optimizations enabled. Clang does not seem to share this issue. The only thing I can think of is that during some more aggressive optimization passes, pointer somehow gets wrong alignment information attached to it. However, I know nothing of GCC's internals to understand how this could happen.