http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033
--- Comment #2 from Teresa Johnson <tejohnson at google dot com> --- On Tue, Jul 30, 2013 at 2:00 PM, olegendo at gcc dot gnu.org <gcc-bugzi...@gcc.gnu.org> wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033 > > Bug ID: 58033 > Summary: counterproductive bb-reorder > Product: gcc > Version: 4.9.0 > Status: UNCONFIRMED > Severity: normal > Priority: P3 > Component: rtl-optimization > Assignee: unassigned at gcc dot gnu.org > Reporter: olegendo at gcc dot gnu.org > CC: steven at gcc dot gnu.org, tejohnson at google dot com > Target: sh*-*-* > > On SH, compiling the following code with -O2 > > #include <bitset> > > std::bitset<32> make_bits (void) > { > std::bitset<32> r; > for (auto&& i : { 4, 5, 6, 10 }) > if (i < r.size ()) > r.set (i); > > return r; > } > > results in the following code: > > mov.l .L8,r1 > mov #0,r0 > mov #31,r7 > mov #1,r6 > mov #4,r2 > .L2: > mov.l @r1,r3 > cmp/hi r7,r3 > bf/s .L7 I assume it is the above branch that is the issue (not the bf/s .L2 below as that is the same in both versions of the code). I'm assuming this is not build with FDO? In that case bbro is probably at the mercy of whatever probabilities the static heuristics assigned to the branches. Although if it is 50-50 then I'm not sure offhand what happens - maybe it is biasing in favor of having the shortest trace? This is a great test case for motivating range propagation. =) Can you attach the dump created with -fdump-rtl-bbro-all? We can see what the edge probabilities are. For some reason it is not compiling for me - what options do you use? My (4_7-based) g++ is complaining about the "auto": $ g++ -O2 pr58033.cc -S pr58033.cc: In function 'std::bitset<32ul> make_bits()': pr58033.cc:6:12: error: expected unqualified-id before '&&' token for (auto&& i : { 4, 5, 6, 10 }) Teresa > mov r6,r5 > .L3: > dt r2 > bf/s .L2 // branch if value not > 31, i.e. in each iteration > add #4,r1 > rts > nop > .align 1 > .L7: > shld r3,r5 > bra .L3 > or r5,r0 > .L9: > .align 2 > .L8: > .long _._45+0 > > _._45: > .long 4 > .long 5 > .long 6 > .long 10 > > Disabling the bb-reorder pass or using -Os results in more compact and faster > code: > > mov.l .L7,r1 > mov #0,r0 > mov #31,r7 > mov #1,r6 > mov #4,r2 > .L2: > mov.l @r1,r3 > cmp/hi r7,r3 > bt/s .L3 // branch if value > 31, i.e. never. > mov r6,r5 > shld r3,r5 > or r5,r0 > .L3: > dt r2 > bf/s .L2 > add #4,r1 > rts > nop > > Of course the bb-reorder pass doesn't know that the values in this case are > always in range. Still, maybe it could be improved by not splitting out a BB > if it consists only of a few insns? I've tried increasing the branch cost but > it won't do anything. > > Teresa, Steven, > > -- > You are receiving this mail because: > You are on the CC list for the bug.