https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65068
Wilco <wdijkstr at arm dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wdijkstr at arm dot com --- Comment #3 from Wilco <wdijkstr at arm dot com> --- Note gcc.target/aarch64/ldp_vec_64_1.c shows the same issue. Same story there, most cores produce inefficient code, only -mcpu=vulcan gets close: foo: add x1, x1, 8 add x2, x0, 8192 .p2align 3 .L2: ldr d0, [x1, -8] ldr d1, [x1], 16 add v0.2s, v0.2s, v1.2s str d0, [x0], 16 cmp x2, x0 bne .L2 ret That still shows an odd pre-increment in the loop header, which blocks the use of ldp. Also in all cases aarch64_legitimize_address is called with offset 32760 on V2SI - this offset does not occur anywhere in the source, so it should not matter how we split it. However what we do for that case affects IVOpt, which is clearly a bug.