https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760
--- Comment #30 from rguenther at suse dot de <rguenther at suse dot de> --- On Fri, 11 Oct 2019, wilco at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760 > > --- Comment #29 from Wilco <wilco at gcc dot gnu.org> --- > (In reply to Jiu Fu Guo from comment #28) > > For these kind of small loops, it would be acceptable to unroll in GIMPLE, > > because register pressure and instruction cost may not be major concerns; > > just like "cunroll" and "cunrolli" passes (complete unroll) which also been > > done at O2. > > Absolutely, unrolling is a high-level optimization like vectorization. To expose ILP? I'd call that low-level though ;) If it exposes data reuse then I'd call it high-level - and at that level we already have passes like predictive commoning or unroll-and-jam doing exactly that. Or vectorization. We've shown though data that unrolling without a good idea on CPU pipeline details is a loss on x86_64. This further hints at it being low-level.