https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91142
Bug ID: 91142 Summary: Incorrect aligned vector load instruction emitted because of vinserti32x4 elision Product: gcc Version: 9.1.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Testcase (cf. https://godbolt.org/z/xBEtqT): #include <x86intrin.h> alignas(32) long mem[100] = {}; __m128i f() { __m128i r{}; __builtin_memcpy(&r, &mem[1], sizeof(r)); return r; } __m512i g() { return _mm512_inserti32x4(__m512i(), f(), 0); } Compile with `-O2 -march=knl` or skylake-avx512. `g()` will incorrectly be translated to an aligned load on GCC 9.1.0, even though it correctly translates `f()` to an unaligned load. The issue is not present on GCC trunk. Also GCC 8 and below didn't implement PR85480, which introduced the optimization to elide the vinserti32x4 instruction.