https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69882

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ok, I think what goes wrong is

t.f90:13:0: note: Detected interleaving load *a_19(D)[_54] and *a_19(D)[_18]
t.f90:13:0: note: Detected interleaving load of size 4 starting with _55 =
*a_19(D)[_54];
t.f90:13:0: note: There is a gap of 2 elements after the group

but we end up loading 4 elements without handling the gap!

I have a patch  (that also makes vectorizing the testcase no longer
profitable).
w/o cost model we get

.L5:
        vmovupd (%rcx), %xmm0
        addl    $1, %r9d
        addq    $64, %rcx
        vmovupd -32(%rcx), %xmm1
        vinsertf128     $0x1, -48(%rcx), %ymm0, %ymm0
        vinsertf128     $0x1, -16(%rcx), %ymm1, %ymm1
        cmpl    %edx, %r9d
        vinsertf128     $1, %xmm1, %ymm0, %ymm0
        vmaxpd  %ymm0, %ymm2, %ymm2
        jb      .L5

which indeed looks not too profitable to me.

Reply via email to