https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117202

            Bug ID: 117202
           Summary: SLP permutation for VLA vectors broken
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

When enabling non-store-lane operation for
gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-3.c the testcase
FAILs execution.

   node 0x5bfda70 (max_nunits=16, refcnt=2) vector([16,16]) signed char
   op: VEC_PERM_EXPR
        stmt 0 _3 = *_2;
        stmt 1 _3 = *_2;
        stmt 2 _3 = *_2;
        stmt 3 _3 = *_2;
        lane permutation { 0[0] 0[0] 0[0] 0[0] }
        children 0x5bfdb08
   node 0x5bfdb08 (max_nunits=16, refcnt=2) vector([16,16]) signed char
   op template: _3 = *_2;
        stmt 0 _3 = *_2;

is what we code generate:

   vectorizing permutation 0x5bfda70 op0[0] op0[0] op0[0] op0[0] (repeat 4)

   as vops0[0][0] vops0[0][0] vops0[0][0] vops0[0][0], vops0[0][1] vops0[0][1]
vops0[0][1] vops0[0][1], vops0[0][2] vops0[0][2] vops0[0][2] vops0[0][2],
vops0[0][[4,4]] vops0[0][[4,4]] vops0[0][[4,4]] vops0[0][[4,4]],
vops0[0][[5,4]] vops0[0][[5,4]] vops0[0][[5,4]] vops0[0][[5,4]],
vops0[0][[6,4]] vops0[0][[6,4]] vops0[0][[6,4]] vops0[0][[6,4]],
vops0[0][[8,8]] vops0[0][[8,8]] vops0[0][[8,8]] vops0[0][[8,8]],
vops0[0][[9,8]] vops0[0][[9,8]] vops0[0][[9,8]] vops0[0][[9,8]],
vops0[0][[10,8]] vops0[0][[10,8]] vops0[0][[10,8]] vops0[0][[10,8]],
vops0[0][[12,12]] vops0[0][[12,12]] vops0[0][[12,12]] vops0[0][[12,12]],
vops0[0][[13,12]] vops0[0][[13,12]] vops0[0][[13,12]] vops0[0][[13,12]],
vops0[0][[14,12]] vops0[0][[14,12]] vops0[0][[14,12]] vops0[0][[14,12]]
   add new stmt: _109 = VEC_PERM_EXPR <vect__3.607_108, vect__3.607_108, { 0,
0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, ... }>;
   add new stmt: _110 = VEC_PERM_EXPR <vect__3.607_108, vect__3.607_108, {
POLY_INT_CST [4, 4], POLY_INT_CST [4, 4], POLY_INT_CST [4, 4], POLY_INT_CST [4,
4], POLY_INT_CST [5, 4], POLY_INT_CST [5, 4], POLY_INT_CST [5, 4], POLY_INT_CST
[5, 4], POLY_INT_CST [6, 4], POLY_INT_CST [6, 4], POLY_INT_CST [6, 4],
POLY_INT_CST [6, 4], ... }>;
   add new stmt: _111 = VEC_PERM_EXPR <vect__3.607_108, vect__3.607_108, {
POLY_INT_CST [8, 8], POLY_INT_CST [8, 8], POLY_INT_CST [8, 8], POLY_INT_CST [8,
8], POLY_INT_CST [9, 8], POLY_INT_CST [9, 8], POLY_INT_CST [9, 8], POLY_INT_CST
[9, 8], POLY_INT_CST [10, 8], POLY_INT_CST [10, 8], POLY_INT_CST [10, 8],
POLY_INT_CST [10, 8], ... }>;
   add new stmt: _112 = VEC_PERM_EXPR <vect__3.607_108, vect__3.607_108, {
POLY_INT_CST [12, 12], POLY_INT_CST [12, 12], POLY_INT_CST [12, 12],
POLY_INT_CST [12, 12], POLY_INT_CST [13, 12], POLY_INT_CST [13, 12],
POLY_INT_CST [13, 12], POLY_INT_CST [13, 12], POLY_INT_CST [14, 12],
POLY_INT_CST [14, 12], POLY_INT_CST [14, 12], POLY_INT_CST [14, 12], ... }>;

but the permutations in the VEC_PERM_EXPRs look odd - the first one,
{ 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, ... } seems obviously good,
the second one should be the same, offsetted by POLY_INT_CST [16,16],
so shouldn't this be { POLY_INT_CST [16, 16], ..., POLY_INT_CST [17, 16], ...
}?
Instead we have different multipliers even?!

In the assembly it's even more weird as the last two permutes seem to be
CSEd to the second, resulting in the same mask used for the second to
fourth masked store.

Reply via email to