http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033

--- Comment #2 from Teresa Johnson <tejohnson at google dot com> ---
On Tue, Jul 30, 2013 at 2:00 PM, olegendo at gcc dot gnu.org
<gcc-bugzi...@gcc.gnu.org> wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58033
>
>             Bug ID: 58033
>            Summary: counterproductive bb-reorder
>            Product: gcc
>            Version: 4.9.0
>             Status: UNCONFIRMED
>           Severity: normal
>           Priority: P3
>          Component: rtl-optimization
>           Assignee: unassigned at gcc dot gnu.org
>           Reporter: olegendo at gcc dot gnu.org
>                 CC: steven at gcc dot gnu.org, tejohnson at google dot com
>             Target: sh*-*-*
>
> On SH, compiling the following code with -O2
>
> #include <bitset>
>
> std::bitset<32> make_bits (void)
> {
>   std::bitset<32> r;
>   for (auto&& i : { 4, 5, 6, 10 })
>     if (i < r.size ())
>       r.set (i);
>
>   return r;
> }
>
> results in the following code:
>
>         mov.l   .L8,r1
>         mov     #0,r0
>         mov     #31,r7
>         mov     #1,r6
>         mov     #4,r2
> .L2:
>         mov.l   @r1,r3
>         cmp/hi  r7,r3
>         bf/s    .L7

I assume it is the above branch that is the issue (not the bf/s .L2
below as that is the same in both versions of the code). I'm assuming
this is not build with FDO? In that case bbro is probably at the mercy
of whatever probabilities the static heuristics assigned to the
branches. Although if it is 50-50 then I'm not sure offhand what
happens - maybe it is biasing in favor of having the shortest trace?
This is a great test case for motivating range propagation. =)

Can you attach the dump created with -fdump-rtl-bbro-all? We can see
what the edge probabilities are. For some reason it is not compiling
for me - what options do you use? My (4_7-based) g++ is complaining
about the "auto":

$ g++ -O2 pr58033.cc -S
pr58033.cc: In function 'std::bitset<32ul> make_bits()':
pr58033.cc:6:12: error: expected unqualified-id before '&&' token
   for (auto&& i : { 4, 5, 6, 10 })

Teresa

>         mov     r6,r5
> .L3:
>         dt      r2
>         bf/s    .L2     // branch if value not > 31, i.e. in each iteration
>         add     #4,r1
>         rts
>         nop
>         .align 1
> .L7:
>         shld    r3,r5
>         bra     .L3
>         or      r5,r0
> .L9:
>         .align 2
> .L8:
>         .long   _._45+0
>
> _._45:
>         .long   4
>         .long   5
>         .long   6
>         .long   10
>
> Disabling the bb-reorder pass or using -Os results in more compact and faster
> code:
>
>         mov.l   .L7,r1
>         mov     #0,r0
>         mov     #31,r7
>         mov     #1,r6
>         mov     #4,r2
> .L2:
>         mov.l   @r1,r3
>         cmp/hi  r7,r3
>         bt/s    .L3    // branch if value > 31, i.e. never.
>         mov     r6,r5
>         shld    r3,r5
>         or      r5,r0
> .L3:
>         dt      r2
>         bf/s    .L2
>         add     #4,r1
>         rts
>         nop
>
> Of course the bb-reorder pass doesn't know that the values in this case are
> always in range.  Still, maybe it could be improved by not splitting out a BB
> if it consists only of a few insns?  I've tried increasing the branch cost but
> it won't do anything.
>
> Teresa, Steven,
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

Reply via email to