[Bug tree-optimization/88760] GCC unrolling is suboptimal

rguenther at suse dot de Mon, 14 Oct 2019 03:48:24 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760

--- Comment #40 from rguenther at suse dot de <rguenther at suse dot de> ---
On Sat, 12 Oct 2019, guojiufu at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760
> 
> --- Comment #39 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
> For small loop (1-2 stmts), in forms of GIMPLE and RTL, it would be around 
> 5-10
> instructions: 2-4 insns per stmt, ~4 insns for idx.
> 
> With current unroller, here is a statistic on spec2017. 
> Using --param max-unrolled-insns=12, there are ~3000 small loops could be
> unrolled totally, and ~40 of these small loops are located in hot-functions.
> 
> Using --param max-unrolled-insns=16, there are ~11000 small loops could be
> unrolled totally, and ~230 of these small loops are located in hot-functions.
> 
> Using --param max-unrolled-insns=20, there are ~15000 small loops could be
> unrolled totally, and ~570 of these small loops are located in hot-functions.
> 
> Using --param max-unrolled-insns=24, there are ~18000 small loops could be
> unrolled totally, and ~680 of these small loops are located in hot-functions.
> 
> 
> if max-unrolled-insns<16, just few small loops are unrolled for hot-functions;
> it may be not very valuable.

So 12 if two times unrolled is already 6 insns, minus IV update and
compare-and-branch (assuming single pattern) that's 4 insns.  On
GIMPLE I'd already call this large since eventual memory loads and
stores would be separate - so there it wuld be ~16 instead of 12.

I think the better approach is to identify the cases where unrolling
would help, and on which (sub-)architectures, and prepare testcases
for them.

I guess the times where our default unroll factor (if it fits the
size limits) of 8 is a good idea is long gone, I'd expect ILP
to stop improving much earlier (depending on the set of operations).
For ILP you also want to do interleaving of the unrolled iterations,
so I point to SMS again here (SMS suffers from the fact that
loop dependence info is weak on RTL, but it uses the scheduler
model of the target).

[Bug tree-optimization/88760] GCC unrolling is suboptimal

Reply via email to