https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118749

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
We are wrongly figuring the vectorized *string access is aligned.  We apply
peeling for alignment here, but the vector loop does not maintain the
initial alignment, instead it accesses a V16QI, only effectively using
V8QI and increments the pointer by 8 elements.

  vect__2.17_108 = MEM <vector(16) unsigned char> [(FcChar8
*)vectp_string.15_106];
  vect__2.18_110 = VEC_PERM_EXPR <vect__2.17_108, vect__2.17_108, { 0, 0, 1, 1,
2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7 }>;
...

this is because we vectorize a SLP reduction (but high/low are reductions),
so we get

t.c:16:18: note:   node 0x41737490 (max_nunits=16, refcnt=2) vector(16)
unsigned char
t.c:16:18: note:   op template: _2 = *string_26;
t.c:16:18: note:        stmt 0 _2 = *string_26;
t.c:16:18: note:        stmt 1 _2 = *string_26;
t.c:16:18: note:        load permutation { 0 0 }

and the bug is that we think we can apply peeling for alignment for this
access with a VF of just 8.  When we then set the known misalignment to zero
with a target alignment of 16 bytes that's when things go downhill.

Testing a patch.

Reply via email to