https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113592

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-*-*

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
The vectorizer for the original testcase generates

  # vect_sum_20.8_49 = PHI <vect_sum_16.21_75(6), { 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0 }(9)>
...
  vect__9.20_68 = vect__5.12_55 * vect__8.16_61;
  vect__9.20_69 = vect__5.12_56 * vect__8.17_63;
  vect__9.20_70 = vect__5.12_57 * vect__8.18_65;
  vect__9.20_71 = vect__5.12_58 * vect__8.19_67;
  _9 = _5 * _8;
  vect_sum_16.21_72 = vect__9.20_68 + vect_sum_20.8_49;
  vect_sum_16.21_73 = vect__9.20_69 + vect_sum_16.21_72;
  vect_sum_16.21_74 = vect__9.20_70 + vect_sum_16.21_73;
  vect_sum_16.21_75 = vect__9.20_71 + vect_sum_16.21_74;
  sum_16 = _9 + sum_20;

the adds are from the optimization to reduce the number of reduction IVs
(we could alternatively keep them independent with 4 IVs and handle the
reducing in the epilogue).  This is to reduce register pressure.

But this also shows if the issue isn't the multiple IVs, that this could
be handled by reassoc + FMA forming given the vectorizer itself doesn't
produce FMAs here.

Reply via email to