https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116367
Bug ID: 116367 Summary: Handle vector shuffles better Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org Target Milestone: --- Apologies for the broad summary, feel free to make it more targeted. Testcase: void test(short * restrict x, short * restrict y) { const char table[] = {7, 6, 5, 4, 3, 2, 1, 0}; for (int i = 0; i < 8; i++) { y[i] = x[table[i]]; } } Compiled with -Ofast on aarch64 gives: test: ldr q31, [x0] adrp x2, .LC0 ldr q29, [x2, #:lo12:.LC0] mov v30.16b, v31.16b tbl v30.16b, {v30.16b - v31.16b}, v29.16b str q30, [x1] ret .LC0: .byte 14 .byte 15 .byte 12 .byte 13 .byte 10 .byte 11 .byte 8 .byte 9 .byte 6 .byte 7 .byte 4 .byte 5 .byte 2 .byte 3 .byte 0 .byte 1 LLVM does something better: test: ldr q0, [x0] rev64 v0.8h, v0.8h ext v0.16b, v0.16b, v0.16b, #8 str q0, [x1] ret