https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113678

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2024-01-31
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think the SLP tree we discover is sound:

t2.c:11:14: note:   node 0x5db76f0 (max_nunits=8, refcnt=2) vector(8) char
t2.c:11:14: note:   op template: *a_7(D) = _1;
t2.c:11:14: note:       stmt 0 *a_7(D) = _1;
t2.c:11:14: note:       stmt 1 MEM[(char *)a_7(D) + 1B] = _2;
t2.c:11:14: note:       stmt 2 MEM[(char *)a_7(D) + 2B] = _3;
t2.c:11:14: note:       stmt 3 MEM[(char *)a_7(D) + 3B] = _4;
t2.c:11:14: note:       stmt 4 MEM[(char *)a_7(D) + 4B] = _1;
t2.c:11:14: note:       stmt 5 MEM[(char *)a_7(D) + 5B] = _2;
t2.c:11:14: note:       stmt 6 MEM[(char *)a_7(D) + 6B] = _3;
t2.c:11:14: note:       stmt 7 MEM[(char *)a_7(D) + 7B] = _4;
t2.c:11:14: note:       children 0x5db7778
t2.c:11:14: note:   node 0x5db7778 (max_nunits=8, refcnt=2) vector(8) char
t2.c:11:14: note:   op template: _1 = *b_6(D);
t2.c:11:14: note:       stmt 0 _1 = *b_6(D);
t2.c:11:14: note:       stmt 1 _2 = MEM[(char *)b_6(D) + 1B];
t2.c:11:14: note:       stmt 2 _3 = MEM[(char *)b_6(D) + 2B];
t2.c:11:14: note:       stmt 3 _4 = MEM[(char *)b_6(D) + 3B];
t2.c:11:14: note:       stmt 4 _1 = *b_6(D);
t2.c:11:14: note:       stmt 5 _2 = MEM[(char *)b_6(D) + 1B];
t2.c:11:14: note:       stmt 6 _3 = MEM[(char *)b_6(D) + 2B];
t2.c:11:14: note:       stmt 7 _4 = MEM[(char *)b_6(D) + 3B];
t2.c:11:14: note:       load permutation { 0 1 2 3 0 1 2 3 }

the issue is as so often

t2.c:11:14: note:   ==> examining statement: _1 = *b_6(D);
t2.c:11:14: missed:   BB vectorization with gaps at the end of a load is not
supported
t2.c:3:19: missed:   not vectorized: relevant stmt not supported: _1 = *b_6(D);
t2.c:11:14: note:   Building vector operands of 0x5db7778 from scalars instead

where we are not applying much non-ad-hoc work to deal with those
"out-of-bound" accesses.  The choice here would be obvious in doing
a single vector(4) load instead.

Reply via email to