https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117202
Bug ID: 117202 Summary: SLP permutation for VLA vectors broken Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- When enabling non-store-lane operation for gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-3.c the testcase FAILs execution. node 0x5bfda70 (max_nunits=16, refcnt=2) vector([16,16]) signed char op: VEC_PERM_EXPR stmt 0 _3 = *_2; stmt 1 _3 = *_2; stmt 2 _3 = *_2; stmt 3 _3 = *_2; lane permutation { 0[0] 0[0] 0[0] 0[0] } children 0x5bfdb08 node 0x5bfdb08 (max_nunits=16, refcnt=2) vector([16,16]) signed char op template: _3 = *_2; stmt 0 _3 = *_2; is what we code generate: vectorizing permutation 0x5bfda70 op0[0] op0[0] op0[0] op0[0] (repeat 4) as vops0[0][0] vops0[0][0] vops0[0][0] vops0[0][0], vops0[0][1] vops0[0][1] vops0[0][1] vops0[0][1], vops0[0][2] vops0[0][2] vops0[0][2] vops0[0][2], vops0[0][[4,4]] vops0[0][[4,4]] vops0[0][[4,4]] vops0[0][[4,4]], vops0[0][[5,4]] vops0[0][[5,4]] vops0[0][[5,4]] vops0[0][[5,4]], vops0[0][[6,4]] vops0[0][[6,4]] vops0[0][[6,4]] vops0[0][[6,4]], vops0[0][[8,8]] vops0[0][[8,8]] vops0[0][[8,8]] vops0[0][[8,8]], vops0[0][[9,8]] vops0[0][[9,8]] vops0[0][[9,8]] vops0[0][[9,8]], vops0[0][[10,8]] vops0[0][[10,8]] vops0[0][[10,8]] vops0[0][[10,8]], vops0[0][[12,12]] vops0[0][[12,12]] vops0[0][[12,12]] vops0[0][[12,12]], vops0[0][[13,12]] vops0[0][[13,12]] vops0[0][[13,12]] vops0[0][[13,12]], vops0[0][[14,12]] vops0[0][[14,12]] vops0[0][[14,12]] vops0[0][[14,12]] add new stmt: _109 = VEC_PERM_EXPR <vect__3.607_108, vect__3.607_108, { 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, ... }>; add new stmt: _110 = VEC_PERM_EXPR <vect__3.607_108, vect__3.607_108, { POLY_INT_CST [4, 4], POLY_INT_CST [4, 4], POLY_INT_CST [4, 4], POLY_INT_CST [4, 4], POLY_INT_CST [5, 4], POLY_INT_CST [5, 4], POLY_INT_CST [5, 4], POLY_INT_CST [5, 4], POLY_INT_CST [6, 4], POLY_INT_CST [6, 4], POLY_INT_CST [6, 4], POLY_INT_CST [6, 4], ... }>; add new stmt: _111 = VEC_PERM_EXPR <vect__3.607_108, vect__3.607_108, { POLY_INT_CST [8, 8], POLY_INT_CST [8, 8], POLY_INT_CST [8, 8], POLY_INT_CST [8, 8], POLY_INT_CST [9, 8], POLY_INT_CST [9, 8], POLY_INT_CST [9, 8], POLY_INT_CST [9, 8], POLY_INT_CST [10, 8], POLY_INT_CST [10, 8], POLY_INT_CST [10, 8], POLY_INT_CST [10, 8], ... }>; add new stmt: _112 = VEC_PERM_EXPR <vect__3.607_108, vect__3.607_108, { POLY_INT_CST [12, 12], POLY_INT_CST [12, 12], POLY_INT_CST [12, 12], POLY_INT_CST [12, 12], POLY_INT_CST [13, 12], POLY_INT_CST [13, 12], POLY_INT_CST [13, 12], POLY_INT_CST [13, 12], POLY_INT_CST [14, 12], POLY_INT_CST [14, 12], POLY_INT_CST [14, 12], POLY_INT_CST [14, 12], ... }>; but the permutations in the VEC_PERM_EXPRs look odd - the first one, { 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, ... } seems obviously good, the second one should be the same, offsetted by POLY_INT_CST [16,16], so shouldn't this be { POLY_INT_CST [16, 16], ..., POLY_INT_CST [17, 16], ... }? Instead we have different multipliers even?! In the assembly it's even more weird as the last two permutes seem to be CSEd to the second, resulting in the same mask used for the second to fourth masked store.