https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116367

            Bug ID: 116367
           Summary: Handle vector shuffles better
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---

Apologies for the broad summary, feel free to make it more targeted.
Testcase:
void test(short * restrict x, short * restrict y) {
    const char table[] = {7, 6, 5, 4, 3, 2, 1, 0};
    for (int i = 0; i < 8; i++) {
        y[i] = x[table[i]];
    }
}

Compiled with -Ofast on aarch64 gives:
test:
        ldr     q31, [x0]
        adrp    x2, .LC0
        ldr     q29, [x2, #:lo12:.LC0]
        mov     v30.16b, v31.16b
        tbl     v30.16b, {v30.16b - v31.16b}, v29.16b
        str     q30, [x1]
        ret
.LC0:
        .byte   14
        .byte   15
        .byte   12
        .byte   13
        .byte   10
        .byte   11
        .byte   8
        .byte   9
        .byte   6
        .byte   7
        .byte   4
        .byte   5
        .byte   2
        .byte   3
        .byte   0
        .byte   1

LLVM does something better:
test:
        ldr     q0, [x0]
        rev64   v0.8h, v0.8h
        ext     v0.16b, v0.16b, v0.16b, #8
        str     q0, [x1]
        ret

Reply via email to