Sorry, typo in previous mail. 

"I also tried counting all SSA names and divide it by a factor. It does
NOT seem to work so well"
> -----Original Message-----
> From: Bin.Cheng [mailto:amker.ch...@gmail.com]
> Sent: 20 June 2014 10:19
> To: Bingfeng Mei
> Cc: gcc@gcc.gnu.org
> Subject: Re: regs_used estimation in IVOPTS seriously flawed
> 
> On Fri, Jun 20, 2014 at 5:01 PM, Bingfeng Mei <b...@broadcom.com> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Bin.Cheng [mailto:amker.ch...@gmail.com]
> >> Sent: 20 June 2014 06:25
> >> To: Bingfeng Mei
> >> Cc: gcc@gcc.gnu.org
> >> Subject: Re: regs_used estimation in IVOPTS seriously flawed
> >>
> >> On Tue, Jun 17, 2014 at 10:59 PM, Bingfeng Mei <b...@broadcom.com>
> wrote:
> >> > Hi,
> >> > I am looking at a performance regression in our code. A big loop
> >> produces
> >> > and uses a lot of temporary variables inside the loop body. The
> >> problem
> >> > appears that IVOPTS pass creates even more induction variables
> (from
> >> original
> >> > 2 to 27). It causes a lot of register spilling later and
> performance
> >> Do you have a simplified case which can be posted here?  I guess it
> >> affects some other targets too.
> >>
> >> > take a severe hit. I looked into tree-ssa-loop-ivopts.c, it does
> call
> >> > estimate_reg_pressure_cost function to take # of registers into
> >> > consideration. The second parameter passed as data->regs_used is
> >> supposed
> >> > to represent old register usage before IVOPTS.
> >> >
> >> >   return size + estimate_reg_pressure_cost (size, data->regs_used,
> >> data->speed,
> >> >                                             data-
> >body_includes_call);
> >> >
> >> > In this case, it is mere 2 by following calculation. Essentially,
> it
> >> only counts
> >> > all loop invariant registers, ignoring all registers produced/used
> >> inside the loop.
> >> There are two kinds of registers produced/used inside the loop.  One
> >> is induction variable irrelevant, it includes non-linear uses as
> >> mentioned by Richard.  The other kind relates to induction variable
> >> rewrite, and one issue with this kind is expression generated during
> >> iv use rewriting is not reflecting the estimated one in ivopt very
> >> well.
> >>
> >
> > As a short term solution, I tried some simple non-linear functions as
> Richard suggested
> 
> Oh, I misread the non-linear way as non-linear iv uses.
> 
> > to penalize using too many IVs. For example, the following cost in
> > ivopts_global_cost_for_size fixed my regression and actually improves
> performance
> > slightly over a set of benchmarks we usually use.
> 
> Great, I will try to tweak it on ARM.
> 
> >
> >   return size * (1 + size * 0.2)
> >           + estimate_reg_pressure_cost (size, data->regs_used, data-
> >speed,
> >                                                        data-
> >body_includes_call);
> >
> > The trouble is choice of this non-linear function could be highly
> target dependent
> > (# of registers?). I don't have setup to prove performance gain for
> other targets.
> >
> > I also tried counting all SSA names and divide it by a factor. It does
> seem to work
> 
> So the number currently computed is the lower bound which is too
> small.  Maybe it's possible to do some analysis with relatively low
> cost increasing the number somehow.  While on the other hand, doesn't
> bring restriction to IVOPT for loops with low register pressure.
> 
> Thanks,
> bin
> 
> > so well.
> >
> > Long term, if we have infrastructure to analyze maximal live variable
> in a loop
> > at tree-level, that would be great for many loop optimizations.
> >
> > Thanks,
> > Bingfeng
> 
> 
> 
> --
> Best Regards.

Reply via email to