Hi Nicolai, On Fri, Nov 11, 2016 at 02:16:18AM +0100, Nicolai Stange wrote: > >> >From the discussion on gcc-patches [1] of what is now the aforementioned > >> r228318 ("bb-reorder: Add -freorder-blocks-algorithm= and wire it up"), > >> it is not clear to me whether this change can actually reduce code size > >> beyond those 0.1% given there for -Os. > > > > There is r228692 as well. > > Ok, summarizing, that changelog says that the simple algorithm > potentially produced even bigger code with -Os than stc did. From that > commit on, this remains true only on x86 and mn10300. Right?
x86 and mn10300 use STC at -Os by default. > >> So first question: > >> Do you guys know of any code where there are more significant code size > >> savings achieved? > > > > For -O2 it is ~15%, which matters a lot for targets where STC isn't faster > > at all (targets without cache / with tiny cache / with only cache memory). > > If I understand you correctly, this means that there is a use case for > having -O2 -freorder-blocks-algorithm=simple, right? Yes, that is why I wrote this code at all :-) (And then it turned out to be actually *bigger* at -Os, so I fixed that). > >> And second question: > >> If that isn't the case, would it possibly make sense to partly revert > >> gcc's behaviour and set -freorder-blocks-algorithm=stc at -Os? > > > > -Os does many other things that are slower but smaller as well. > > Sure. Let me restate my original question: assume for a moment that it > is true that -Os with simple never produces code smaller than 0.1% of > what is created by -Os with stc. I haven't got any idea what the "other > things" are able to achieve w.r.t code size savings, but to me, 0.1% > doesn't appear to be that huge. Don't get me wrong: I *really* can't > judge on whether 0.1% is a significant improvement or not. I'm just > assuming that it's not. With this assumption, the question of whether > those saved 0.1% are really worth the significantly decreased > performance encountered in some situations seemed just natural... It all depends on the tradeoff you want. There are many knobs you can turn -- for example the inlining params, that has quite some effect on code size. -Os is mostly -O2 except those things that increase code size. What is the tradeoff in your case? What is a realistic number for the slowdown of your kernel? Do you see hotspots in there that should be handled better anyhow? Etc. > No, I want small, possibly at the cost of performance to the extent of > what's sensible. What sensible actually is is what my question is about. It is different for every use case I'm afraid. > So, summarizing, I'm not asking whether I should use -O2 or -Os or > whatever, but whether the current behaviour I'm seeing with -Os is > intended/expected quantitatively. With simple you get smaller code than with STC, so -Os uses simple. If that is ridiculously slower then you won't hear me complaining if you propose defaulting it the other way; but you haven't shown any convincing numbers yet? Segher