The following is sth I noticed when looking at a way to fix PR81303. We happily compute a runtime cost model threshold that executes the vectorized variant even though no vector iteration takes place due to the number of prologue/epilogue iterations. The following fixes that -- note that if we do not know the prologue/epilogue counts statically they are estimated at vf/2 which means there's still the chance the vector iteration won't execute. To fix that we'd have to estimate those as vf-1 instead, sth we might consider doing anyway given that we regularly completely peel the epilogues vf-1 times in that case. Maybe as followup.
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2016-07-21 Richard Biener <rguent...@suse.de> PR tree-optimization/81303 * tree-vect-loop.c (vect_estimate_min_profitable_iters): Take into account prologue and epilogue iterations when raising min_profitable_iters to sth at least covering one vector iteration. Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c (revision 250384) +++ gcc/tree-vect-loop.c (working copy) @@ -3702,8 +3702,9 @@ vect_estimate_min_profitable_iters (loop " Calculated minimum iters for profitability: %d\n", min_profitable_iters); - min_profitable_iters = - min_profitable_iters < vf ? vf : min_profitable_iters; + /* We want the vectorized loop to execute at least once. */ + if (min_profitable_iters < (vf + peel_iters_prologue + peel_iters_epilogue)) + min_profitable_iters = vf + peel_iters_prologue + peel_iters_epilogue; if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location,