https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117202
--- Comment #5 from Richard Sandiford <rsandifo at gcc dot gnu.org> --- FWIW, gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-3.c seems to produce similar VEC_PERM_EXPRs for SVE, but it works there. The idea is that we're unpacking one vector of [16,16] chars into 4 vectors of [16,16] chars (i.e. 16+16X chars). Each output vector therefore gets [16,16]/4==[4,4] elements from the shared input vector. > In the assembly it's even more weird as the last two permutes seem to be CSEd to the second, resulting in the same mask used for the second to fourth masked store. Yeah, that does seem wrong. For SVE we get: .L4: ld1b z26.b, p7/z, [x1, x5] ld1b z25.b, p7/z, [x2, x5] tbl z3.b, {z26.b}, z30.b add z3.b, z3.b, z31.b tbl z2.b, {z25.b}, z30.b cmpne p6.b, p7/z, z2.b, #0 st1b z3.b, p6, [x6] tbl z1.b, {z26.b}, z29.b add z1.b, z1.b, z31.b tbl z0.b, {z25.b}, z29.b cmpne p6.b, p7/z, z0.b, #0 st1b z1.b, p6, [x6, #1, mul vl] tbl z24.b, {z26.b}, z28.b add z24.b, z24.b, z31.b tbl z23.b, {z25.b}, z28.b cmpne p6.b, p7/z, z23.b, #0 st1b z24.b, p6, [x6, #2, mul vl] tbl z26.b, {z26.b}, z27.b add z26.b, z26.b, z31.b tbl z25.b, {z25.b}, z27.b cmpne p6.b, p7/z, z25.b, #0 st1b z26.b, p6, [x6, #3, mul vl] add x6, x6, x8 add x5, x5, x7 cmp x9, x5 bcs .L4 which isn't pretty, but does have the expected number of TBLs.