> -----Original Message----- > From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: 18 June 2014 12:36 > To: Bingfeng Mei > Cc: gcc@gcc.gnu.org > Subject: Re: regs_used estimation in IVOPTS seriously flawed > > On Tue, Jun 17, 2014 at 4:59 PM, Bingfeng Mei <b...@broadcom.com> wrote: > > Hi, > > I am looking at a performance regression in our code. A big loop > produces > > and uses a lot of temporary variables inside the loop body. The > problem > > appears that IVOPTS pass creates even more induction variables (from > original > > 2 to 27). It causes a lot of register spilling later and performance > > take a severe hit. I looked into tree-ssa-loop-ivopts.c, it does call > > estimate_reg_pressure_cost function to take # of registers into > > consideration. The second parameter passed as data->regs_used is > supposed > > to represent old register usage before IVOPTS. > > > > return size + estimate_reg_pressure_cost (size, data->regs_used, > data->speed, > > data->body_includes_call); > > > > In this case, it is mere 2 by following calculation. Essentially, it > only counts > > all loop invariant registers, ignoring all registers produced/used > inside the loop. > > > > n = 0; > > for (psi = gsi_start_phis (loop->header); !gsi_end_p (psi); gsi_next > (&psi)) > > { > > phi = gsi_stmt (psi); > > op = PHI_RESULT (phi); > > > > if (virtual_operand_p (op)) > > continue; > > > > if (get_iv (data, op)) > > continue; > > > > n++; > > } > > > > EXECUTE_IF_SET_IN_BITMAP (data->relevant, 0, j, bi) > > { > > struct version_info *info = ver_info (data, j); > > > > if (info->inv_id && info->has_nonlin_use) > > n++; > > } > > > > data->regs_used = n; > > > > I believe how regs_used is calculated is seriously flawed, > > or estimate_reg_pressure_cost is problematic if n_old is > > only supposed to be loop invariant registers. Either way, > > it affects how IVOPTS makes decision and could result in > > worse code. What do you think? Any idea on how to improve > > this? > > Well, it's certainly a lower bound on the number of registers > live through the whole loop execution (thus over the backedge). > So they have the same cost as an induction variable as far > as register pressure is concerned. > > What it doesn't account for is the maximum number of live > registers anywhere in the loop body - but that is hard to > estimate at this point in the compilation. You could compute > the maximum number of live SSA names which could be > an upper bound on the register pressure - but that needs > liveness analysis which is expensive also that upper bound > is probably way too high. > Yes, I agree it is hard and probably expensive at this stage of compilation to do accurate analysis. But it could be quite useful for many tree-level loop optimizations, even just a half-accurate estimation for register pressure, as also discussed in another thread a few days ago.
> So I think the current logic is sensible and simple. It's just > not perfect. > > Maybe it's just the cost function of the IV set choosen that > needs to be adjusted to account for the number of IVs > in a non-linear way? That is, adjust ivopts_global_cost_for_size > which just adds size to sth that pessimizes more IVs even > more like size * (1 + size / (1 + data->regs_used)) or > simply size ** (1. + eps) with a suitable eps < 2. > I am going to try a few cost functions as you suggested. Maybe also just count all SSA together and divide it by a factor. Thanks, Bingfeng