On Tue, Jun 17, 2014 at 4:59 PM, Bingfeng Mei <b...@broadcom.com> wrote: > Hi, > I am looking at a performance regression in our code. A big loop produces > and uses a lot of temporary variables inside the loop body. The problem > appears that IVOPTS pass creates even more induction variables (from original > 2 to 27). It causes a lot of register spilling later and performance > take a severe hit. I looked into tree-ssa-loop-ivopts.c, it does call > estimate_reg_pressure_cost function to take # of registers into > consideration. The second parameter passed as data->regs_used is supposed > to represent old register usage before IVOPTS. > > return size + estimate_reg_pressure_cost (size, data->regs_used, > data->speed, > data->body_includes_call); > > In this case, it is mere 2 by following calculation. Essentially, it only > counts > all loop invariant registers, ignoring all registers produced/used inside the > loop. > > n = 0; > for (psi = gsi_start_phis (loop->header); !gsi_end_p (psi); gsi_next (&psi)) > { > phi = gsi_stmt (psi); > op = PHI_RESULT (phi); > > if (virtual_operand_p (op)) > continue; > > if (get_iv (data, op)) > continue; > > n++; > } > > EXECUTE_IF_SET_IN_BITMAP (data->relevant, 0, j, bi) > { > struct version_info *info = ver_info (data, j); > > if (info->inv_id && info->has_nonlin_use) > n++; > } > > data->regs_used = n; > > I believe how regs_used is calculated is seriously flawed, > or estimate_reg_pressure_cost is problematic if n_old is > only supposed to be loop invariant registers. Either way, > it affects how IVOPTS makes decision and could result in > worse code. What do you think? Any idea on how to improve > this?
Well, it's certainly a lower bound on the number of registers live through the whole loop execution (thus over the backedge). So they have the same cost as an induction variable as far as register pressure is concerned. What it doesn't account for is the maximum number of live registers anywhere in the loop body - but that is hard to estimate at this point in the compilation. You could compute the maximum number of live SSA names which could be an upper bound on the register pressure - but that needs liveness analysis which is expensive also that upper bound is probably way too high. So I think the current logic is sensible and simple. It's just not perfect. Maybe it's just the cost function of the IV set choosen that needs to be adjusted to account for the number of IVs in a non-linear way? That is, adjust ivopts_global_cost_for_size which just adds size to sth that pessimizes more IVs even more like size * (1 + size / (1 + data->regs_used)) or simply size ** (1. + eps) with a suitable eps < 2. Richard. > > Thanks, > Bingfeng >