On Thu, Aug 13, 2015 at 6:26 PM, sa...@hederstierna.com
<fred...@hederstierna.com> wrote:
> Hi
> I'm using an ARM thumb cross compiler for embedded systems and always do 
> optimize for small size with -Os.
>
> Though I've experimented with optimization flags, and loop unrolling.
>
> Normally loop unrolling is always bad for size, code is duplicated and size 
> increases.
>
> Though I discovered that in some special cases where the number of iteration 
> is very small, eg a loop of 2-3 times,
> in this case an unrolling could make code size smaller - eg. losen up 
> registers used for index in loops etc.
>
> Example when I use the flag "-fpeel-loops" together with -Os I will 99% of 
> the cases get smaller code size for ARM thumb target.
>
> Some my question is how unrolling works with -Os, is it always totally 
> disabled,
> or are there some cases when it could be tested, eg. with small number 
> iterations, so loop can be eliminated?
>
> Could eg. "-fpeel-loops" be enabled by default for -Os perhaps? Now its only 
> enabled for -O2 and above I think.

Complete peeling is already enabled with -Os, it is just restricted to
those cases where GCCs cost modeling of the
unrolling operation determines the code size shrinks.  If you enable
-fpeel-loops then the cost model allows the
code size to grow - sth not (always) intended with -Os.

The solution is of course to improve the cost modeling and GCCs idea
of followup optimization opportunities.
I do have some incomplete patches to improve that and hope to get back
to it for GCC 6.

If you have (small) testcases that show code size improvements with
-Os -fpeel-loops over -Os and you are
confident they are caused by unrolling please open a bugzilla containing them.

Thanks,
Richard.

> Thanks and Best Regards
> Fredrik

Reply via email to