https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108950

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
It's because we do

  if (slp_node
      && !(!single_defuse_cycle
           && !lane_reduc_code_p
           && reduction_type != FOLD_LEFT_REDUCTION))
    for (i = 0; i < (int) op.num_ops; i++)
      if (!vect_maybe_update_slp_op_vectype (slp_op[i], vectype_op[i]))
        {
          if (dump_enabled_p ())
            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                             "incompatible vector types for invariants\n");
          return false;
        }

for the invariant lower precision op of the reduction.  Arguably that's
not efficient.  We're detecting a widen_sum from

  n.0_1 = n;
  _2 = (int) n.0_1;
  m_lsm.19_13 = m;

  <bb 3> [local count: 1073741824]:
  # iter.6_8 = PHI <0(2), iter.6_9(5)>
  # m_lsm.19_12 = PHI <m_lsm.19_13(2), _5(5)>
  # ivtmp_7 = PHI <2(2), ivtmp_18(5)>
  _4 = _2 + m_lsm.19_12;

which is where we could fix this.  I have a patch for the reduction code
for now.

Reply via email to