Segher Boessenkool writes:

> On Wed, Jan 04, 2017 at 10:05:49AM +0100, Richard Biener wrote:
>> > The code size is identical, but the trunk version executes one more
>> > instruction everytime the loop runs (explicit jump to .L5 with trunk vs
>> > fallthrough with 4.8) - it's faster only if the loop never runs. This
>> > happens irrespective of the memory clobber inline assembler statement.
>
> With -Os you've asked for smaller code, not faster code.
>
> All of the block reordering is based on heuristics -- there is no polynomial
> time and space algorithm to do it optimally, let alone the linear time and
> space we need in GCC -- so there always will be cases we do not handle
> optimally.  -Os does not get as much attention as -O2 etc., as well.
>
> OTOH this seems to be a pretty common case that we could handle.  Please
> open a PR to keep track of this?
>

Filed PR 79012 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79012)

Regards
Senthil

Reply via email to