https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111097
--- Comment #3 from Fabio Cannizzo <fabio at cannizzo dot net> --- Hi Richard Using -fno-strict-aliasing the issue disappear, however I am not sure if that is a real fix, or if it is merely circumstantial. As mentioned, there are many other possible workarounds which also make the issue disappear. The most curious one being the one below, where p has type __m128i*, and simply casting it to XV* before the assignment, things stop working. Worth noting, this code works correctly with Visual Studio and with clang. template <int N, int JB> static FORCE_INLINE void doLoop(__m128i*& p, __m128i& xC, __m128i& xD, __m128i bmask) { auto pend = p + N; do { __m128i tmp = advance1(p[0], p[JB], xC, xD, bmask); xC = xD; xD = tmp; #if 1 struct XV { __m128i m_v; XV(__m128i v) : m_v(v) {} }; ((XV *)p)[0] = tmp; #else p[0] = tmp; #endif ++p; } while (p != pend); }