https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117202

--- Comment #5 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
FWIW, gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-3.c seems to
produce similar VEC_PERM_EXPRs for SVE, but it works there.

The idea is that we're unpacking one vector of [16,16] chars into 4 vectors of
[16,16] chars (i.e. 16+16X chars).  Each output vector therefore gets
[16,16]/4==[4,4] elements from the shared input vector.

> In the assembly it's even more weird as the last two permutes seem to be
CSEd to the second, resulting in the same mask used for the second to
fourth masked store.
Yeah, that does seem wrong.  For SVE we get:

.L4:
        ld1b    z26.b, p7/z, [x1, x5]
        ld1b    z25.b, p7/z, [x2, x5]
        tbl     z3.b, {z26.b}, z30.b
        add     z3.b, z3.b, z31.b
        tbl     z2.b, {z25.b}, z30.b
        cmpne   p6.b, p7/z, z2.b, #0
        st1b    z3.b, p6, [x6]
        tbl     z1.b, {z26.b}, z29.b
        add     z1.b, z1.b, z31.b
        tbl     z0.b, {z25.b}, z29.b
        cmpne   p6.b, p7/z, z0.b, #0
        st1b    z1.b, p6, [x6, #1, mul vl]
        tbl     z24.b, {z26.b}, z28.b
        add     z24.b, z24.b, z31.b
        tbl     z23.b, {z25.b}, z28.b
        cmpne   p6.b, p7/z, z23.b, #0
        st1b    z24.b, p6, [x6, #2, mul vl]
        tbl     z26.b, {z26.b}, z27.b
        add     z26.b, z26.b, z31.b
        tbl     z25.b, {z25.b}, z27.b
        cmpne   p6.b, p7/z, z25.b, #0
        st1b    z26.b, p6, [x6, #3, mul vl]
        add     x6, x6, x8
        add     x5, x5, x7
        cmp     x9, x5
        bcs     .L4

which isn't pretty, but does have the expected number of TBLs.

Reply via email to