> We're not able to enable BB reordering with -Os.  The behaviour is
> hard-coded via this if statement in rest_of_handle_reorder_blocks():
> 
>   if ((flag_reorder_blocks || flag_reorder_blocks_and_partition)
>       /* Don't reorder blocks when optimizing for size because extra
> jump insns may
>        be created; also barrier may create extra padding.
> 
>        More correctly we should have a block reordering mode that
tried
> to
>        minimize the combined size of all the jumps.  This would more
or
> less
>        automatically remove extra jumps, but would also try to use
more
> short
>        jumps instead of long jumps.  */
>       && optimize_function_for_speed_p (cfun))
>     {
>       reorder_basic_blocks ();
> 
> If you comment out the "&& optimize_function_for_speed_p (cfun)" then
> BB reordering takes places as desired (although this isn't a solution
> obviously).
> 
> In a private message Ian indicated that this had a small impact for
the
> ISA he's working with but a significant performance gain.  I tried the
> same thing with the ISA I work on (Ubicom32) and this change typically
> increased code sizes by between 0.1% and 0.3% but improved performance
> by anything from 0.8% to 3% so on balance this is definitely winning
> for most of our users (this for a couple of benchmarks, the Linux
> kernel, busybox and smbd).
> 

It should be noted that commenting out the conditional to do with
optimising for speed will make BB reordering come on for all functions,
even cold ones, so I think whatever gains have come from making this
hacky change could increase further if BB reordering is set to
only come on for hot functions when compiling with -Os.  (Certainly
the code size increases could be minimised, whilst hopefully retaining
the performance gains.)

Note that I am in no way suggesting this should be the default
behaviour for -Os, but that it should be switchable via the
flags just like other optimisations are.  But, once it is switchable,
I expect choosing to turn it on for -Os should not cause universal
enabling of BB reordering for every function (as opposed to the current
universal disabling of BB reordering for every function), but a sensible
half-way point, based on heat, so that you get the performance wins with
minimal code size increases on selected functions.

Cheers,
Ian

Reply via email to