https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> --- So if we agree to a sane way to cost branchy code on the scalar side then it should be possible to compare the scalar cost of the not if-converted inner loop body against the full partially vectorized and if-converted inner loop body. vect_bb_vectorization_profitable_p would have to add the cost of the scalar stmts not covered by vectorization - this set is conveniently available as the set of stmts not having the visited flag set before we clear it here: vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo, vec<slp_instance> slp_instances) { ... /* Unset visited flag. */ stmt_info_for_cost *cost; FOR_EACH_VEC_ELT (scalar_costs, i, cost) gimple_set_visited (cost->stmt_info->stmt, false); so we'd need to walk over all stmts in the BB and add the cost of the not marked stmts to the vector cost. We'd want to force a single SLP "subgraph" in this mode to avoid going over the whole block multiple times.