On Wed, Jan 04, 2017 at 10:05:49AM +0100, Richard Biener wrote: > > The code size is identical, but the trunk version executes one more > > instruction everytime the loop runs (explicit jump to .L5 with trunk vs > > fallthrough with 4.8) - it's faster only if the loop never runs. This > > happens irrespective of the memory clobber inline assembler statement.
With -Os you've asked for smaller code, not faster code. All of the block reordering is based on heuristics -- there is no polynomial time and space algorithm to do it optimally, let alone the linear time and space we need in GCC -- so there always will be cases we do not handle optimally. -Os does not get as much attention as -O2 etc., as well. OTOH this seems to be a pretty common case that we could handle. Please open a PR to keep track of this? > I belive that doing BB reorder in CFG layout mode is fundamentally > flawed but I guess it's wired up so that out-of-CFG layout honors > EDGE_FALLTHRU. Why is this fundamentally flawed? The reordering is much easier this way. > In any way, why does BB reorder not "fix" the "bogus" reorder > into-CFG-layout performs? I'm not sure what bogus reorder you're talking about here? cfg_layout_initialize should not reorder anything (other than the usual things cleanup_cfg does)? Segher