[Bug tree-optimization/117202] New: SLP permutation for VLA vectors broken

rguenth at gcc dot gnu.org via Gcc-bugs Fri, 18 Oct 2024 01:32:49 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117202


            Bug ID: 117202
           Summary: SLP permutation for VLA vectors broken
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

When enabling non-store-lane operation for
gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-3.c the testcase
FAILs execution.

   node 0x5bfda70 (max_nunits=16, refcnt=2) vector([16,16]) signed char
   op: VEC_PERM_EXPR
        stmt 0 _3 = *_2;
        stmt 1 _3 = *_2;
        stmt 2 _3 = *_2;
        stmt 3 _3 = *_2;
        lane permutation { 0[0] 0[0] 0[0] 0[0] }
        children 0x5bfdb08
   node 0x5bfdb08 (max_nunits=16, refcnt=2) vector([16,16]) signed char
   op template: _3 = *_2;
        stmt 0 _3 = *_2;

is what we code generate:

   vectorizing permutation 0x5bfda70 op0[0] op0[0] op0[0] op0[0] (repeat 4)

   as vops0[0][0] vops0[0][0] vops0[0][0] vops0[0][0], vops0[0][1] vops0[0][1]
vops0[0][1] vops0[0][1], vops0[0][2] vops0[0][2] vops0[0][2] vops0[0][2],
vops0[0][[4,4]] vops0[0][[4,4]] vops0[0][[4,4]] vops0[0][[4,4]],
vops0[0][[5,4]] vops0[0][[5,4]] vops0[0][[5,4]] vops0[0][[5,4]],
vops0[0][[6,4]] vops0[0][[6,4]] vops0[0][[6,4]] vops0[0][[6,4]],
vops0[0][[8,8]] vops0[0][[8,8]] vops0[0][[8,8]] vops0[0][[8,8]],
vops0[0][[9,8]] vops0[0][[9,8]] vops0[0][[9,8]] vops0[0][[9,8]],
vops0[0][[10,8]] vops0[0][[10,8]] vops0[0][[10,8]] vops0[0][[10,8]],
vops0[0][[12,12]] vops0[0][[12,12]] vops0[0][[12,12]] vops0[0][[12,12]],
vops0[0][[13,12]] vops0[0][[13,12]] vops0[0][[13,12]] vops0[0][[13,12]],
vops0[0][[14,12]] vops0[0][[14,12]] vops0[0][[14,12]] vops0[0][[14,12]]
   add new stmt: _109 = VEC_PERM_EXPR <vect__3.607_108, vect__3.607_108, { 0,
0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, ... }>;
   add new stmt: _110 = VEC_PERM_EXPR <vect__3.607_108, vect__3.607_108, {
POLY_INT_CST [4, 4], POLY_INT_CST [4, 4], POLY_INT_CST [4, 4], POLY_INT_CST [4,
4], POLY_INT_CST [5, 4], POLY_INT_CST [5, 4], POLY_INT_CST [5, 4], POLY_INT_CST
[5, 4], POLY_INT_CST [6, 4], POLY_INT_CST [6, 4], POLY_INT_CST [6, 4],
POLY_INT_CST [6, 4], ... }>;
   add new stmt: _111 = VEC_PERM_EXPR <vect__3.607_108, vect__3.607_108, {
POLY_INT_CST [8, 8], POLY_INT_CST [8, 8], POLY_INT_CST [8, 8], POLY_INT_CST [8,
8], POLY_INT_CST [9, 8], POLY_INT_CST [9, 8], POLY_INT_CST [9, 8], POLY_INT_CST
[9, 8], POLY_INT_CST [10, 8], POLY_INT_CST [10, 8], POLY_INT_CST [10, 8],
POLY_INT_CST [10, 8], ... }>;
   add new stmt: _112 = VEC_PERM_EXPR <vect__3.607_108, vect__3.607_108, {
POLY_INT_CST [12, 12], POLY_INT_CST [12, 12], POLY_INT_CST [12, 12],
POLY_INT_CST [12, 12], POLY_INT_CST [13, 12], POLY_INT_CST [13, 12],
POLY_INT_CST [13, 12], POLY_INT_CST [13, 12], POLY_INT_CST [14, 12],
POLY_INT_CST [14, 12], POLY_INT_CST [14, 12], POLY_INT_CST [14, 12], ... }>;

but the permutations in the VEC_PERM_EXPRs look odd - the first one,
{ 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, ... } seems obviously good,
the second one should be the same, offsetted by POLY_INT_CST [16,16],
so shouldn't this be { POLY_INT_CST [16, 16], ..., POLY_INT_CST [17, 16], ...
}?
Instead we have different multipliers even?!

In the assembly it's even more weird as the last two permutes seem to be
CSEd to the second, resulting in the same mask used for the second to
fourth masked store.

[Bug tree-optimization/117202] New: SLP permutation for VLA vectors broken

Reply via email to