https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101842
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Tamar Christina from comment #3) > (In reply to Richard Biener from comment #2) > > OK, so with a hack like the following we vectorize the BB as > > > > vect__1.10_62 = MEM <vector(4) float> [(float *)p_34]; > > vect_powmult_9.11_61 = vect__1.10_62 * vect__1.10_62; > > _60 = .REDUC_PLUS (vect_powmult_9.11_61); > > d_25 = d_35 - _60; > > p_26 = p_34 + 16; > > i_27 = i_37 + 4; > > _10 = len_20(D) > i_27; > > _11 = lim_21(D) <= d_25; > > _12 = _10 & _11; > > if (_12 != 0) > > > > Ah awesome! > > > > > the hack simply re-starts reduction discovery at the "previous" stmt > > (this breaks down after skipping the first stmt eventually). As said, > > it's a hack. But is that the kind of vectorization you expect? > > Yeah that looks perfect, the patch seems to be based on a different code > than upstream so couldn't apply it to test the full loop, but this looks > perfect! (We already vectorize a similar loop without the `&& d >= lim` > condition). It's applied to my working tree so that's possible. Note it doesn't vectorize the loop but the loop body in basic-block vectorization.