https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108950
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
It's because we do
if (slp_node
&& !(!single_defuse_cycle
&& !lane_reduc_code_p
&& reduction_type != FOLD_LEFT_REDUCTION))
for (i = 0; i < (int) op.num_ops; i++)
if (!vect_maybe_update_slp_op_vectype (slp_op[i], vectype_op[i]))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"incompatible vector types for invariants\n");
return false;
}
for the invariant lower precision op of the reduction. Arguably that's
not efficient. We're detecting a widen_sum from
n.0_1 = n;
_2 = (int) n.0_1;
m_lsm.19_13 = m;
<bb 3> [local count: 1073741824]:
# iter.6_8 = PHI <0(2), iter.6_9(5)>
# m_lsm.19_12 = PHI <m_lsm.19_13(2), _5(5)>
# ivtmp_7 = PHI <2(2), ivtmp_18(5)>
_4 = _2 + m_lsm.19_12;
which is where we could fix this. I have a patch for the reduction code
for now.