https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65068

Wilco <wdijkstr at arm dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wdijkstr at arm dot com

--- Comment #3 from Wilco <wdijkstr at arm dot com> ---
Note gcc.target/aarch64/ldp_vec_64_1.c shows the same issue. Same story there,
most cores produce inefficient code, only -mcpu=vulcan gets close:

foo:
        add     x1, x1, 8
        add     x2, x0, 8192
        .p2align 3
.L2:
        ldr     d0, [x1, -8]
        ldr     d1, [x1], 16
        add     v0.2s, v0.2s, v1.2s
        str     d0, [x0], 16
        cmp     x2, x0
        bne     .L2
        ret

That still shows an odd pre-increment in the loop header, which blocks the use
of ldp.

Also in all cases aarch64_legitimize_address is called with offset 32760 on
V2SI - this offset does not occur anywhere in the source, so it should not
matter how we split it. However what we do for that case affects IVOpt, which
is clearly a bug.

Reply via email to