https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118749
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- We are wrongly figuring the vectorized *string access is aligned. We apply peeling for alignment here, but the vector loop does not maintain the initial alignment, instead it accesses a V16QI, only effectively using V8QI and increments the pointer by 8 elements. vect__2.17_108 = MEM <vector(16) unsigned char> [(FcChar8 *)vectp_string.15_106]; vect__2.18_110 = VEC_PERM_EXPR <vect__2.17_108, vect__2.17_108, { 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7 }>; ... this is because we vectorize a SLP reduction (but high/low are reductions), so we get t.c:16:18: note: node 0x41737490 (max_nunits=16, refcnt=2) vector(16) unsigned char t.c:16:18: note: op template: _2 = *string_26; t.c:16:18: note: stmt 0 _2 = *string_26; t.c:16:18: note: stmt 1 _2 = *string_26; t.c:16:18: note: load permutation { 0 0 } and the bug is that we think we can apply peeling for alignment for this access with a VF of just 8. When we then set the known misalignment to zero with a target alignment of 16 bytes that's when things go downhill. Testing a patch.