> We're not able to enable BB reordering with -Os. The behaviour is > hard-coded via this if statement in rest_of_handle_reorder_blocks(): > > if ((flag_reorder_blocks || flag_reorder_blocks_and_partition) > /* Don't reorder blocks when optimizing for size because extra > jump insns may > be created; also barrier may create extra padding. > > More correctly we should have a block reordering mode that tried > to > minimize the combined size of all the jumps. This would more or > less > automatically remove extra jumps, but would also try to use more > short > jumps instead of long jumps. */ > && optimize_function_for_speed_p (cfun)) > { > reorder_basic_blocks (); > > If you comment out the "&& optimize_function_for_speed_p (cfun)" then > BB reordering takes places as desired (although this isn't a solution > obviously). > > In a private message Ian indicated that this had a small impact for the > ISA he's working with but a significant performance gain. I tried the > same thing with the ISA I work on (Ubicom32) and this change typically > increased code sizes by between 0.1% and 0.3% but improved performance > by anything from 0.8% to 3% so on balance this is definitely winning > for most of our users (this for a couple of benchmarks, the Linux > kernel, busybox and smbd). >
It should be noted that commenting out the conditional to do with optimising for speed will make BB reordering come on for all functions, even cold ones, so I think whatever gains have come from making this hacky change could increase further if BB reordering is set to only come on for hot functions when compiling with -Os. (Certainly the code size increases could be minimised, whilst hopefully retaining the performance gains.) Note that I am in no way suggesting this should be the default behaviour for -Os, but that it should be switchable via the flags just like other optimisations are. But, once it is switchable, I expect choosing to turn it on for -Os should not cause universal enabling of BB reordering for every function (as opposed to the current universal disabling of BB reordering for every function), but a sensible half-way point, based on heat, so that you get the performance wins with minimal code size increases on selected functions. Cheers, Ian