https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69882
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- Ok, I think what goes wrong is t.f90:13:0: note: Detected interleaving load *a_19(D)[_54] and *a_19(D)[_18] t.f90:13:0: note: Detected interleaving load of size 4 starting with _55 = *a_19(D)[_54]; t.f90:13:0: note: There is a gap of 2 elements after the group but we end up loading 4 elements without handling the gap! I have a patch (that also makes vectorizing the testcase no longer profitable). w/o cost model we get .L5: vmovupd (%rcx), %xmm0 addl $1, %r9d addq $64, %rcx vmovupd -32(%rcx), %xmm1 vinsertf128 $0x1, -48(%rcx), %ymm0, %ymm0 vinsertf128 $0x1, -16(%rcx), %ymm1, %ymm1 cmpl %edx, %r9d vinsertf128 $1, %xmm1, %ymm0, %ymm0 vmaxpd %ymm0, %ymm2, %ymm2 jb .L5 which indeed looks not too profitable to me.