https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117594

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
So I think the issue is we have an induction

  <bb 2>
  a_lsm.11_27 = a;
  _55 = VEC_SERIES_EXPR <a_lsm.11_27, 4294967287>;

  <bb 3> [local count: 955630224]:
  # vect_vec_iv_.24_56 = PHI <_57(11), _55(2)>
  # ivtmp_64 = PHI <ivtmp_65(11), _63(2)>
  loop_len_44 = MIN_EXPR <ivtmp_64, POLY_INT_CST [8, 8]>;
  _57 = vect_vec_iv_.24_56 + { POLY_INT_CST [4294967260, 4294967260], ... };
..

  <bb 5>:
  # vect__3.25_59 = PHI <vect__3.25_58(3)>
  # loop_len_66 = PHI <loop_len_44(3)>
  _60 = loop_len_66 + 18446744073709551615;
  _61 = .VEC_EXTRACT (vect__3.25_59, _60);
  a_lsm.11_35 = _61;

where the _56 increment is -9 * 4 == -36 aka POLY_INT_CST [-36, -36], whatever
that means, oddly "disconnected" from the loop_len = MIN <ivtmp, POLY_INT_CST
[8, 8]> check.

Ah, so this IV has vector([4,4]) unsigned int, VF is [4, 4] as well.  But
at the same time we have vector([8,8] unsigned short data, which oddly has
two lanes, the representative node being

t.c:7:28: note:   node 0x5cf0bc0 (max_nunits=8, refcnt=2) vector([8,8])
unsigned short
t.c:7:28: note:   op template: _8 = e.3_5 + b.5_7;
t.c:7:28: note:         stmt 0 _8 = e.3_5 + b.5_7;
t.c:7:28: note:         stmt 1 _9 = _8 + 1;
t.c:7:28: note:         children 0x5cf0c58 0x5cf0cf0

this is how we represent SLP reduction chains, so the VF is correct here,
it's then the MIN() that looks wrong?

We do vect_set_loop_controls_directly with vector([8,8]) unsigned short and

$15 = {max_nscalars_per_iter = 2, factor = 1, 
  type = <vector_type 0x7ffff71e6f18>, compare_type = <tree 0x0>, controls = {
    m_vec = 0x5d5fd00 = {0x7ffff6e088b8}}, bias_adjusted_ctrl = <tree 0x0>}


But I suppose the issue is that vectorizable_live_operation_1 picks the
"wrong" len via

      tree len = vect_get_loop_len (loop_vinfo, &gsi,
                                    &LOOP_VINFO_LENS (loop_vinfo),
                                    1, vectype, 0, 0);

given we only have a rgroup control for vector([8,8]) unsigned short?!

Indeed for the [4,4] vectype we're getting the above rgroup control
for [8,8] with max_nscalars_per_iter == 2 since we pass factor == 0
(which doesn't make much sense?!), we do that in a few places.

I have a fix.

Reply via email to