https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79830
--- Comment #4 from Petr <kobalicek.petr at gmail dot com> ---
I think the test-case can be simplified to the following code. It still suffers
from the same issues as mentioned above.
#include <stdint.h>
#if defined(_MSC_VER)
# include <intrin.h>
#else
# include <x86intrin.h>
#endif
void transform(double* dst, const double* src, const double* matrix, size_t
length) {
intptr_t i = static_cast<intptr_t>(length);
while ((i -= 2) >= 0) {
__m256d s0 = _mm256_loadu_pd(src);
_mm256_storeu_pd(dst, _mm256_add_pd(s0, s0));
dst += 4;
src += 4;
}
if (i & 1) {
__m128d s0 = _mm_loadu_pd(src);
_mm_storeu_pd(dst, _mm_add_pd(s0, s0));
}
}