Hi Segher, thanks for your prompt reply!
Segher Boessenkool <seg...@kernel.crashing.org> writes: > On Fri, Nov 11, 2016 at 12:03:44AM +0100, Nicolai Stange wrote: >> in the course of doing some benchmarks on arm with -Os, I noticed that >> some list traversion code became significantly slower since gcc 5.3 when >> instruction caches are cold. > > But is it smaller? This tiny example function is not, but on average? The Linux kernel's .text with for my config at hand is smaller by ~0.1% with simple than with stc. I gave this tiny example only to demonstrate the bb ordering issue I was talking about. Of course, it's made up. So in particular it was not meant to show anything related to code size. > If you care about speed instead of size, you should not use -Os. Indeed. >> That being said, I could certainly go and submit a patch to the Linux >> kernel setting -freorder-blocks-algorithm=stc for the -Os case. > > Or do not set CONFIG_CC_OPTIMIZE_FOR_SIZE in your kernel config. Yes, of course. >> >From the discussion on gcc-patches [1] of what is now the aforementioned >> r228318 ("bb-reorder: Add -freorder-blocks-algorithm= and wire it up"), >> it is not clear to me whether this change can actually reduce code size >> beyond those 0.1% given there for -Os. > > There is r228692 as well. Ok, summarizing, that changelog says that the simple algorithm potentially produced even bigger code with -Os than stc did. From that commit on, this remains true only on x86 and mn10300. Right? >> So first question: >> Do you guys know of any code where there are more significant code size >> savings achieved? > > For -O2 it is ~15%, which matters a lot for targets where STC isn't faster > at all (targets without cache / with tiny cache / with only cache memory). If I understand you correctly, this means that there is a use case for having -O2 -freorder-blocks-algorithm=simple, right? My question is about whether switching the default algorithm for -Os might make sense, c.f. below. >> And second question: >> If that isn't the case, would it possibly make sense to partly revert >> gcc's behaviour and set -freorder-blocks-algorithm=stc at -Os? > > -Os does many other things that are slower but smaller as well. Sure. Let me restate my original question: assume for a moment that it is true that -Os with simple never produces code smaller than 0.1% of what is created by -Os with stc. I haven't got any idea what the "other things" are able to achieve w.r.t code size savings, but to me, 0.1% doesn't appear to be that huge. Don't get me wrong: I *really* can't judge on whether 0.1% is a significant improvement or not. I'm just assuming that it's not. With this assumption, the question of whether those saved 0.1% are really worth the significantly decreased performance encountered in some situations seemed just natural... > There is no way to ask for somewhat fast and somewhat small at the > same time, which seems to be what you want? No, I want small, possibly at the cost of performance to the extent of what's sensible. What sensible actually is is what my question is about. Example: A (hypothetical) code size saving of 0.00000000001% at the cost of 10000000000x slower code certainly isn't. But 0.1% at the cost of some additional 0.5us here and there -- no clue. So, summarizing, I'm not asking whether I should use -O2 or -Os or whatever, but whether the current behaviour I'm seeing with -Os is intended/expected quantitatively. Thank you! Nicolai