https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91178

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
The main issue is that loop vectorization creates a chain of increments

  # vectp_f.21_116 = PHI <vectp_f.21_117(7), vectp_f.22_115(9)>
  vect__16.23_118 = MEM <vector(4) int> [(int *)vectp_f.21_116];
  vectp_f.21_119 = vectp_f.21_116 + 16;
  vectp_f.21_121 = vectp_f.21_119 + 16;
  vectp_f.21_123 = vectp_f.21_121 + 16;
  vectp_f.21_125 = vectp_f.21_123 + 16;
...
  vectp_f.21_182363 = vectp_f.21_182361 + 16;
  vectp_f.21_182365 = vectp_f.21_182363 + 16;
  vectp_f.21_182367 = vectp_f.21_182365 + 16;
  vect__16.91149_182369 = VEC_PERM_EXPR <vect__16.23_118, vect__16.23_118, { 0,
0, 0, 0 }>;
  vect__16.91150_182370 = VEC_PERM_EXPR <vect__16.23_118, vect__16.23_118, { 0,
0, 0, 0 }>;
  vect__16.91151_182371 = VEC_PERM_EXPR <vect__16.23_118, vect__16.22804_45680,
{ 0, 6, 6, 6 }>;
  vect__16.91152_182372 = VEC_PERM_EXPR <vect__16.22804_45680,
vect__16.22804_45680, { 2, 2, 2, 2 }>;
  vect__16.91153_182373 = VEC_PERM_EXPR <vect__16.22804_45680,
vect__16.45586_91244, { 2, 2, 4, 4 }>;
  vect__16.91154_182374 = VEC_PERM_EXPR <vect__16.45586_91244,
vect__16.45586_91244, { 0, 0, 0, 0 }>;
  vect__16.91155_182375 = VEC_PERM_EXPR <vect__16.45586_91244,
vect__16.68367_136806, { 0, 0, 0, 6 }>;
  vect__16.91156_182376 = VEC_PERM_EXPR <vect__16.68367_136806,
vect__16.68367_136806, { 2, 2, 2, 2 }>;
  vect__16.91157_182377 = VEC_PERM_EXPR <vect__16.68367_136806,
vect__16.68367_136806, { 2, 2, 2, 2 }>;
  vect__73.91158_182378 = vect__73.20_106 - vect__16.91149_182369;
  vect__73.91158_182379 = vect__73.20_107 - vect__16.91150_182370;
  vect__73.91158_182380 = vect__73.20_108 - vect__16.91151_182371;
  vect__73.91158_182381 = vect__73.20_109 - vect__16.91152_182372;
  vect__73.91158_182382 = vect__73.20_110 - vect__16.91153_182373;
  vect__73.91158_182383 = vect__73.20_111 - vect__16.91154_182374;
  vect__73.91158_182384 = vect__73.20_112 - vect__16.91155_182375;
  vect__73.91158_182385 = vect__73.20_113 - vect__16.91156_182376;
  vect__73.91158_182386 = vect__73.20_114 - vect__16.91157_182377;
  vectp_f.21_117 = vectp_f.21_182367 + 16;
  ivtmp_182463 = ivtmp_182462 + 1;
  if (ivtmp_182463 < bnd.17_102)
    goto <bb 7>; [0.00%]
  else
    goto <bb 11>; [100.00%]

where it first generates one load for each of the increments and then
the permutation makes most of them dead.  For interleaving we have some
cut-off to avoid this kind of code-gen but for SLP we don't.
DR group size is 91126 here and gap 91125 (aka single element interleaving).

Reply via email to