https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65660
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- With a true fix I get t.c:3:20: note: Cost model analysis: Vector inside of loop cost: 4 Vector prologue cost: 13 Vector epilogue cost: 11 Scalar iteration cost: 4 Scalar outside cost: 0 Vector outside cost: 24 prologue iterations: 2 epilogue iterations: 2 Calculated minimum iters for profitability: 7 t.c:3:20: note: Runtime profitability threshold = 6 t.c:3:20: note: Static estimate profitability threshold = 6 thus we still vectorize the loop for bdver2. This is because of an oddity in its cost model which has 6, /* scalar_stmt_cost. */ 4, /* scalar load_cost. */ 4, /* scalar_store_cost. */ 6, /* vec_stmt_cost. */ 0, /* vec_to_scalar_cost. */ 2, /* scalar_to_vec_cost. */ 4, /* vec_align_load_cost. */ 4, /* vec_unalign_load_cost. */ 4, /* vec_store_cost. */ 2, /* cond_taken_branch_cost. */ 1, /* cond_not_taken_branch_cost. */ and thus the prologue/epilogue is not pessimized enough for the extra branches (which are very cheap compared to the scalar and vector stmt costs). I am still testing the patch to avoid the round-off errors and really account scalar stmts correctly. I suppose the EON regression should be fixed by instead avoiding the peeling for alignment with a better idea on cost.