On 01/04/2017 03:46 AM, Segher Boessenkool wrote:
On Wed, Jan 04, 2017 at 10:05:49AM +0100, Richard Biener wrote:
The code size is identical, but the trunk version executes one more
instruction everytime the loop runs (explicit jump to .L5 with trunk vs
fallthrough with 4.8) - it's faster only if the loop never runs. This
happens irrespective of the memory clobber inline assembler statement.

With -Os you've asked for smaller code, not faster code.

All of the block reordering is based on heuristics -- there is no polynomial
time and space algorithm to do it optimally, let alone the linear time and
space we need in GCC -- so there always will be cases we do not handle
optimally.  -Os does not get as much attention as -O2 etc., as well.

OTOH this seems to be a pretty common case that we could handle.  Please
open a PR to keep track of this?
I superficially looked at this a little while ago and concluded that it's something we ought to be able to handle. However, it wasn't critical enough to me to get familiar enough with the bbro code to deeply analyze -- thus I put it into my gcc-8 queue.



I belive that doing BB reorder in CFG layout mode is fundamentally
flawed but I guess it's wired up so that out-of-CFG layout honors
EDGE_FALLTHRU.

Why is this fundamentally flawed?  The reordering is much easier this way.
Agreed (that we ought to be doing reordering in CFG layout mode).

Jeff

Reply via email to