Hi Nicolai, On Fri, Nov 11, 2016 at 12:03:44AM +0100, Nicolai Stange wrote: > in the course of doing some benchmarks on arm with -Os, I noticed that > some list traversion code became significantly slower since gcc 5.3 when > instruction caches are cold.
But is it smaller? This tiny example function is not, but on average? If you care about speed instead of size, you should not use -Os. > That being said, I could certainly go and submit a patch to the Linux > kernel setting -freorder-blocks-algorithm=stc for the -Os case. Or do not set CONFIG_CC_OPTIMIZE_FOR_SIZE in your kernel config. > >From the discussion on gcc-patches [1] of what is now the aforementioned > r228318 ("bb-reorder: Add -freorder-blocks-algorithm= and wire it up"), > it is not clear to me whether this change can actually reduce code size > beyond those 0.1% given there for -Os. There is r228692 as well. > So first question: > Do you guys know of any code where there are more significant code size > savings achieved? For -O2 it is ~15%, which matters a lot for targets where STC isn't faster at all (targets without cache / with tiny cache / with only cache memory). > And second question: > If that isn't the case, would it possibly make sense to partly revert > gcc's behaviour and set -freorder-blocks-algorithm=stc at -Os? -Os does many other things that are slower but smaller as well. There is no way to ask for somewhat fast and somewhat small at the same time, which seems to be what you want? Segher