On Tue, Jun 17, 2014 at 4:59 PM, Bingfeng Mei <b...@broadcom.com> wrote:
> Hi,
> I am looking at a performance regression in our code. A big loop produces
> and uses a lot of temporary variables inside the loop body. The problem
> appears that IVOPTS pass creates even more induction variables (from original
> 2 to 27). It causes a lot of register spilling later and performance
> take a severe hit. I looked into tree-ssa-loop-ivopts.c, it does call
> estimate_reg_pressure_cost function to take # of registers into
> consideration. The second parameter passed as data->regs_used is supposed
> to represent old register usage before IVOPTS.
>
>   return size + estimate_reg_pressure_cost (size, data->regs_used, 
> data->speed,
>                                             data->body_includes_call);
>
> In this case, it is mere 2 by following calculation. Essentially, it only 
> counts
> all loop invariant registers, ignoring all registers produced/used inside the 
> loop.
>
>   n = 0;
>   for (psi = gsi_start_phis (loop->header); !gsi_end_p (psi); gsi_next (&psi))
>     {
>       phi = gsi_stmt (psi);
>       op = PHI_RESULT (phi);
>
>       if (virtual_operand_p (op))
>         continue;
>
>       if (get_iv (data, op))
>         continue;
>
>       n++;
>     }
>
>   EXECUTE_IF_SET_IN_BITMAP (data->relevant, 0, j, bi)
>     {
>       struct version_info *info = ver_info (data, j);
>
>       if (info->inv_id && info->has_nonlin_use)
>         n++;
>     }
>
>   data->regs_used = n;
>
> I believe how regs_used is calculated is seriously flawed,
> or estimate_reg_pressure_cost is problematic if n_old is
> only supposed to be loop invariant registers. Either way,
> it affects how IVOPTS makes decision and could result in
> worse code. What do you think? Any idea on how to improve
> this?

Well, it's certainly a lower bound on the number of registers
live through the whole loop execution (thus over the backedge).
So they have the same cost as an induction variable as far
as register pressure is concerned.

What it doesn't account for is the maximum number of live
registers anywhere in the loop body - but that is hard to
estimate at this point in the compilation.  You could compute
the maximum number of live SSA names which could be
an upper bound on the register pressure - but that needs
liveness analysis which is expensive also that upper bound
is probably way too high.

So I think the current logic is sensible and simple.  It's just
not perfect.

Maybe it's just the cost function of the IV set choosen that
needs to be adjusted to account for the number of IVs
in a non-linear way?  That is, adjust ivopts_global_cost_for_size
which just adds size to sth that pessimizes more IVs even
more like size * (1 + size / (1 + data->regs_used)) or
simply size ** (1. + eps) with a suitable eps < 2.

Richard.

>
> Thanks,
> Bingfeng
>

Reply via email to