> On 01/04/2017 03:46 AM, Segher Boessenkool wrote: > >On Wed, Jan 04, 2017 at 10:05:49AM +0100, Richard Biener wrote: > >>>The code size is identical, but the trunk version executes one more > >>>instruction everytime the loop runs (explicit jump to .L5 with trunk vs > >>>fallthrough with 4.8) - it's faster only if the loop never runs. This > >>>happens irrespective of the memory clobber inline assembler statement. > > > >With -Os you've asked for smaller code, not faster code. > > > >All of the block reordering is based on heuristics -- there is no polynomial > >time and space algorithm to do it optimally, let alone the linear time and > >space we need in GCC -- so there always will be cases we do not handle > >optimally. -Os does not get as much attention as -O2 etc., as well. > > > >OTOH this seems to be a pretty common case that we could handle. Please > >open a PR to keep track of this? > I superficially looked at this a little while ago and concluded that it's > something we ought to be able to handle. However, it wasn't critical enough > to me to get familiar enough with the bbro code to deeply analyze -- thus I > put it into my gcc-8 queue.
The heuristics should handle such simple case just fine. I guess some bug crept in during the years. > > > > > >>I belive that doing BB reorder in CFG layout mode is fundamentally > >>flawed but I guess it's wired up so that out-of-CFG layout honors > >>EDGE_FALLTHRU. > > > >Why is this fundamentally flawed? The reordering is much easier this way. > Agreed (that we ought to be doing reordering in CFG layout mode). In fact cfglayout was invented to implement bb-reorder originally :) Honza > > Jeff