Segher Boessenkool writes: > On Wed, Jan 04, 2017 at 10:05:49AM +0100, Richard Biener wrote: >> > The code size is identical, but the trunk version executes one more >> > instruction everytime the loop runs (explicit jump to .L5 with trunk vs >> > fallthrough with 4.8) - it's faster only if the loop never runs. This >> > happens irrespective of the memory clobber inline assembler statement. > > With -Os you've asked for smaller code, not faster code. > > All of the block reordering is based on heuristics -- there is no polynomial > time and space algorithm to do it optimally, let alone the linear time and > space we need in GCC -- so there always will be cases we do not handle > optimally. -Os does not get as much attention as -O2 etc., as well. > > OTOH this seems to be a pretty common case that we could handle. Please > open a PR to keep track of this? >
Filed PR 79012 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79012) Regards Senthil