https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96481
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- so one interesting speciality of this testcase is that the ifs switch between two scalar values and overall there's no control flow effect. That is, for the issue of splitting the dataref groups which we currently do on a BB granularity we could solve this by not splitting groups when we are sure all members of the group are either executed or not executed (which is the real intent of this splitting). In fact the current dataref_group compute seems to be useful only for making sure to split groups _inside_ BBs at suitable points and the cross-BB split is ensured by vect_analyze_data_ref_accesses. We'd need to enhance the dataref_group computation to be conservative for cross-BB groups to relax the latter (and for outer loop vect compute it there as well). The simplest correctness fix is to ensure the group_id is bumped when going from one BB to the next. That would help this case up to encountering the PHIs/ifs which we only vectorize when they are in the same BB. inline unsigned opt(unsigned a, unsigned b, unsigned c, unsigned d) { return a > b ? c : d; } void opt( unsigned * __restrict dst, const unsigned *pa, const unsigned *pb, const unsigned *pc, const unsigned *pd ) { unsigned tem = opt(*pa++, *pb++, *pc++, *pd++); unsigned tem1 = opt(*pa++, *pb++, *pc++, *pd++); unsigned tem2 = opt(*pa++, *pb++, *pc++, *pd++); unsigned tem3 = opt(*pa++, *pb++, *pc++, *pd++); *dst++ = tem; *dst++ = tem1; *dst++ = tem2; *dst++ = tem3; } ends up with _35 = {iftmp.24_22, iftmp.24_23, iftmp.24_24, iftmp.24_25}; vectp.30_34 = dst_26(D); MEM <vector(4) unsigned int> [(unsigned int *)vectp.30_34] = _35; SLP discovery stops at the PHIs which are spread out (and there's still the loads spread as well).