https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79830
--- Comment #4 from Petr <kobalicek.petr at gmail dot com> --- I think the test-case can be simplified to the following code. It still suffers from the same issues as mentioned above. #include <stdint.h> #if defined(_MSC_VER) # include <intrin.h> #else # include <x86intrin.h> #endif void transform(double* dst, const double* src, const double* matrix, size_t length) { intptr_t i = static_cast<intptr_t>(length); while ((i -= 2) >= 0) { __m256d s0 = _mm256_loadu_pd(src); _mm256_storeu_pd(dst, _mm256_add_pd(s0, s0)); dst += 4; src += 4; } if (i & 1) { __m128d s0 = _mm_loadu_pd(src); _mm_storeu_pd(dst, _mm_add_pd(s0, s0)); } }