https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113678
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2024-01-31 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- I think the SLP tree we discover is sound: t2.c:11:14: note: node 0x5db76f0 (max_nunits=8, refcnt=2) vector(8) char t2.c:11:14: note: op template: *a_7(D) = _1; t2.c:11:14: note: stmt 0 *a_7(D) = _1; t2.c:11:14: note: stmt 1 MEM[(char *)a_7(D) + 1B] = _2; t2.c:11:14: note: stmt 2 MEM[(char *)a_7(D) + 2B] = _3; t2.c:11:14: note: stmt 3 MEM[(char *)a_7(D) + 3B] = _4; t2.c:11:14: note: stmt 4 MEM[(char *)a_7(D) + 4B] = _1; t2.c:11:14: note: stmt 5 MEM[(char *)a_7(D) + 5B] = _2; t2.c:11:14: note: stmt 6 MEM[(char *)a_7(D) + 6B] = _3; t2.c:11:14: note: stmt 7 MEM[(char *)a_7(D) + 7B] = _4; t2.c:11:14: note: children 0x5db7778 t2.c:11:14: note: node 0x5db7778 (max_nunits=8, refcnt=2) vector(8) char t2.c:11:14: note: op template: _1 = *b_6(D); t2.c:11:14: note: stmt 0 _1 = *b_6(D); t2.c:11:14: note: stmt 1 _2 = MEM[(char *)b_6(D) + 1B]; t2.c:11:14: note: stmt 2 _3 = MEM[(char *)b_6(D) + 2B]; t2.c:11:14: note: stmt 3 _4 = MEM[(char *)b_6(D) + 3B]; t2.c:11:14: note: stmt 4 _1 = *b_6(D); t2.c:11:14: note: stmt 5 _2 = MEM[(char *)b_6(D) + 1B]; t2.c:11:14: note: stmt 6 _3 = MEM[(char *)b_6(D) + 2B]; t2.c:11:14: note: stmt 7 _4 = MEM[(char *)b_6(D) + 3B]; t2.c:11:14: note: load permutation { 0 1 2 3 0 1 2 3 } the issue is as so often t2.c:11:14: note: ==> examining statement: _1 = *b_6(D); t2.c:11:14: missed: BB vectorization with gaps at the end of a load is not supported t2.c:3:19: missed: not vectorized: relevant stmt not supported: _1 = *b_6(D); t2.c:11:14: note: Building vector operands of 0x5db7778 from scalars instead where we are not applying much non-ad-hoc work to deal with those "out-of-bound" accesses. The choice here would be obvious in doing a single vector(4) load instead.