https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113592
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Target| |x86_64-*-* --- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- The vectorizer for the original testcase generates # vect_sum_20.8_49 = PHI <vect_sum_16.21_75(6), { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 }(9)> ... vect__9.20_68 = vect__5.12_55 * vect__8.16_61; vect__9.20_69 = vect__5.12_56 * vect__8.17_63; vect__9.20_70 = vect__5.12_57 * vect__8.18_65; vect__9.20_71 = vect__5.12_58 * vect__8.19_67; _9 = _5 * _8; vect_sum_16.21_72 = vect__9.20_68 + vect_sum_20.8_49; vect_sum_16.21_73 = vect__9.20_69 + vect_sum_16.21_72; vect_sum_16.21_74 = vect__9.20_70 + vect_sum_16.21_73; vect_sum_16.21_75 = vect__9.20_71 + vect_sum_16.21_74; sum_16 = _9 + sum_20; the adds are from the optimization to reduce the number of reduction IVs (we could alternatively keep them independent with 4 IVs and handle the reducing in the epilogue). This is to reduce register pressure. But this also shows if the issue isn't the multiple IVs, that this could be handled by reassoc + FMA forming given the vectorizer itself doesn't produce FMAs here.