Re: [PATCH] Check calls before loop unrolling

Jan Hubicka Fri, 20 Nov 2020 07:22:58 -0800

> On Thu, Nov 19, 2020 at 03:30:37PM -0700, Jeff Law wrote:
> > > No, the vast majority of people will *not* (consciously) use them,
> > > because the target defaults will set things to useful values.
> > >
> > > The compiler could use saner "generic" defaults perhaps, but those will
> > > still not be satisfactory for anyone (except when they aren't generic in
> > > fact but instead tuned for one arch ;-) ) -- unrolling is just too
> > > important for performance.
> > Then fix the heuristics, don't add new PARAMS :-)
> 
> I just said that cannot work?
> 
> > It didn't even occur to me until now that you may be pushing to have the
> > ppc backend have different values for the PARAMS.  I would strongly
> > discourage that.  It's been a huge headache in the s390 backend already.
> 
> It also makes a huge performance difference.  That the generic parts
> of GCC are only tuned for x86 (or not well tuned for anything?) is a
> huge roadblock for us.


As you know I spend quite some time on inliner heuristics but even after
the years I have no clear idea how the requirements differs from x86-64
to ppc, arm and s390.  Clearly compared to x86_64 prologues may get more
expensive on ppc/arm because of more registers (so we should inline less
to cold code) and function calls are more expensive (so we sould inline
more to hot code). We do have PR for that in testusite where most of
them I looked through.

Problem is that each of us has different metodology - different
bechmarks to look at and different opinions on what is good for O2 and
O3.  From long term maintenace POV I am worried about changing a lot of
--param defaults in different backends simply becuase the meaning of
those values keeps changing (as early opts improve; we get better on
tracking optimizations during IPA passes; and our focus shift from C
with sane inlines to basic C++ to heavy templatized C++ with many broken
inline hints to heavy C++ with lto).

For this reason I tend to preffer to not tweak in taret specific ways
unless there is very clear evidence to do so just because I think I will
not be able to maintain code quality testing in future.

It would be very interesting to set up testing that could let us compare
basic arches side to side to different defaults. Our LNT testing does
good job for x86-64 but we have basically zero coverage publically
available on other targets and it is very hard to get inliner relevant
banchmarks (where SPEC is not the best choice) done in comparable way on
multiple arches.

Honza

Re: [PATCH] Check calls before loop unrolling

Reply via email to