On Thu, Aug 13, 2015 at 6:26 PM, sa...@hederstierna.com <fred...@hederstierna.com> wrote: > Hi > I'm using an ARM thumb cross compiler for embedded systems and always do > optimize for small size with -Os. > > Though I've experimented with optimization flags, and loop unrolling. > > Normally loop unrolling is always bad for size, code is duplicated and size > increases. > > Though I discovered that in some special cases where the number of iteration > is very small, eg a loop of 2-3 times, > in this case an unrolling could make code size smaller - eg. losen up > registers used for index in loops etc. > > Example when I use the flag "-fpeel-loops" together with -Os I will 99% of > the cases get smaller code size for ARM thumb target. > > Some my question is how unrolling works with -Os, is it always totally > disabled, > or are there some cases when it could be tested, eg. with small number > iterations, so loop can be eliminated? > > Could eg. "-fpeel-loops" be enabled by default for -Os perhaps? Now its only > enabled for -O2 and above I think.
Complete peeling is already enabled with -Os, it is just restricted to those cases where GCCs cost modeling of the unrolling operation determines the code size shrinks. If you enable -fpeel-loops then the cost model allows the code size to grow - sth not (always) intended with -Os. The solution is of course to improve the cost modeling and GCCs idea of followup optimization opportunities. I do have some incomplete patches to improve that and hope to get back to it for GCC 6. If you have (small) testcases that show code size improvements with -Os -fpeel-loops over -Os and you are confident they are caused by unrolling please open a bugzilla containing them. Thanks, Richard. > Thanks and Best Regards > Fredrik