Sorry, typo in previous mail. "I also tried counting all SSA names and divide it by a factor. It does NOT seem to work so well"
> -----Original Message----- > From: Bin.Cheng [mailto:amker.ch...@gmail.com] > Sent: 20 June 2014 10:19 > To: Bingfeng Mei > Cc: gcc@gcc.gnu.org > Subject: Re: regs_used estimation in IVOPTS seriously flawed > > On Fri, Jun 20, 2014 at 5:01 PM, Bingfeng Mei <b...@broadcom.com> wrote: > > > > > >> -----Original Message----- > >> From: Bin.Cheng [mailto:amker.ch...@gmail.com] > >> Sent: 20 June 2014 06:25 > >> To: Bingfeng Mei > >> Cc: gcc@gcc.gnu.org > >> Subject: Re: regs_used estimation in IVOPTS seriously flawed > >> > >> On Tue, Jun 17, 2014 at 10:59 PM, Bingfeng Mei <b...@broadcom.com> > wrote: > >> > Hi, > >> > I am looking at a performance regression in our code. A big loop > >> produces > >> > and uses a lot of temporary variables inside the loop body. The > >> problem > >> > appears that IVOPTS pass creates even more induction variables > (from > >> original > >> > 2 to 27). It causes a lot of register spilling later and > performance > >> Do you have a simplified case which can be posted here? I guess it > >> affects some other targets too. > >> > >> > take a severe hit. I looked into tree-ssa-loop-ivopts.c, it does > call > >> > estimate_reg_pressure_cost function to take # of registers into > >> > consideration. The second parameter passed as data->regs_used is > >> supposed > >> > to represent old register usage before IVOPTS. > >> > > >> > return size + estimate_reg_pressure_cost (size, data->regs_used, > >> data->speed, > >> > data- > >body_includes_call); > >> > > >> > In this case, it is mere 2 by following calculation. Essentially, > it > >> only counts > >> > all loop invariant registers, ignoring all registers produced/used > >> inside the loop. > >> There are two kinds of registers produced/used inside the loop. One > >> is induction variable irrelevant, it includes non-linear uses as > >> mentioned by Richard. The other kind relates to induction variable > >> rewrite, and one issue with this kind is expression generated during > >> iv use rewriting is not reflecting the estimated one in ivopt very > >> well. > >> > > > > As a short term solution, I tried some simple non-linear functions as > Richard suggested > > Oh, I misread the non-linear way as non-linear iv uses. > > > to penalize using too many IVs. For example, the following cost in > > ivopts_global_cost_for_size fixed my regression and actually improves > performance > > slightly over a set of benchmarks we usually use. > > Great, I will try to tweak it on ARM. > > > > > return size * (1 + size * 0.2) > > + estimate_reg_pressure_cost (size, data->regs_used, data- > >speed, > > data- > >body_includes_call); > > > > The trouble is choice of this non-linear function could be highly > target dependent > > (# of registers?). I don't have setup to prove performance gain for > other targets. > > > > I also tried counting all SSA names and divide it by a factor. It does > seem to work > > So the number currently computed is the lower bound which is too > small. Maybe it's possible to do some analysis with relatively low > cost increasing the number somehow. While on the other hand, doesn't > bring restriction to IVOPT for loops with low register pressure. > > Thanks, > bin > > > so well. > > > > Long term, if we have infrastructure to analyze maximal live variable > in a loop > > at tree-level, that would be great for many loop optimizations. > > > > Thanks, > > Bingfeng > > > > -- > Best Regards.