Re: Suboptimal bb ordering with -Os on arm

Segher Boessenkool Thu, 10 Nov 2016 18:02:56 -0800

Hi Nicolai,

On Fri, Nov 11, 2016 at 02:16:18AM +0100, Nicolai Stange wrote:
> >> >From the discussion on gcc-patches [1] of what is now the aforementioned
> >> r228318 ("bb-reorder: Add -freorder-blocks-algorithm= and wire it up"),
> >> it is not clear to me whether this change can actually reduce code size
> >> beyond those 0.1% given there for -Os.
> >
> > There is r228692 as well.
> 
> Ok, summarizing, that changelog says that the simple algorithm
> potentially produced even bigger code with -Os than stc did. From that
> commit on, this remains true only on x86 and mn10300. Right?


x86 and mn10300 use STC at -Os by default.

> >> So first question:
> >> Do you guys know of any code where there are more significant code size
> >> savings achieved?
> >
> > For -O2 it is ~15%, which matters a lot for targets where STC isn't faster
> > at all (targets without cache / with tiny cache / with only cache memory).
> 
> If I understand you correctly, this means that there is a use case for
> having -O2 -freorder-blocks-algorithm=simple, right?

Yes, that is why I wrote this code at all :-)

(And then it turned out to be actually *bigger* at -Os, so I fixed that).

> >> And second question:
> >> If that isn't the case, would it possibly make sense to partly revert
> >> gcc's behaviour and set -freorder-blocks-algorithm=stc at -Os?
> >
> > -Os does many other things that are slower but smaller as well.
> 
> Sure. Let me restate my original question: assume for a moment that it
> is true that -Os with simple never produces code smaller than 0.1% of
> what is created by -Os with stc. I haven't got any idea what the "other
> things" are able to achieve w.r.t code size savings, but to me, 0.1%
> doesn't appear to be that huge. Don't get me wrong: I *really* can't
> judge on whether 0.1% is a significant improvement or not. I'm just
> assuming that it's not. With this assumption, the question of whether
> those saved 0.1% are really worth the significantly decreased
> performance encountered in some situations seemed just natural...

It all depends on the tradeoff you want.  There are many knobs you can
turn -- for example the inlining params, that has quite some effect on
code size.

-Os is mostly -O2 except those things that increase code size.

What is the tradeoff in your case?  What is a realistic number for the
slowdown of your kernel?  Do you see hotspots in there that should be
handled better anyhow?  Etc.

> No, I want small, possibly at the cost of performance to the extent of
> what's sensible. What sensible actually is is what my question is about.

It is different for every use case I'm afraid.

> So, summarizing, I'm not asking whether I should use -O2 or -Os or
> whatever, but whether the current behaviour I'm seeing with -Os is
> intended/expected quantitatively.

With simple you get smaller code than with STC, so -Os uses simple.
If that is ridiculously slower then you won't hear me complaining if
you propose defaulting it the other way; but you haven't shown any
convincing numbers yet?


Segher

Re: Suboptimal bb ordering with -Os on arm

Reply via email to