On Wed, 21 Dec 2016, Senthil Kumar Selvaraj wrote:

> Hi,
> 
>   For this C code (slightly modified from PR 30908)
> 
> void wait(int i)
> {
>       while (i-- > 0)
>               asm volatile("nop" ::: "memory");
> }
> 
>   gcc 4.8 at -Os produces
> 
>         jmp     .L2
> .L3:
>         nop
>         decl    %edi
> .L2:
>         testl   %edi, %edi
>         jg      .L3
>         ret
> 
> whereas gcc trunk (and 4.9 onwards, from a quick check) produces
> 
> .L2:
>         testl   %edi, %edi
>         jle     .L5
>         nop
>         decl    %edi
>         jmp     .L2
> .L5:
>         ret
> 
> The code size is identical, but the trunk version executes one more
> instruction everytime the loop runs (explicit jump to .L5 with trunk vs
> fallthrough with 4.8) - it's faster only if the loop never runs. This
> happens irrespective of the memory clobber inline assembler statement.
> 
> Digging into the dump files, I found that the transformation occurs in
> the bb reorder pass, when it calls cfg_layout_initialize, which
> eventually calls try_redirect_by_replacing_jump with in_cfglayout set to
> true. That function then removes the jump and causes the RTL
> transformation that eventually results in slower code.
> 
> Is this intentional? If not, what would be the best way to fix this?

I belive that doing BB reorder in CFG layout mode is fundamentally
flawed but I guess it's wired up so that out-of-CFG layout honors
EDGE_FALLTHRU.

In any way, why does BB reorder not "fix" the "bogus" reorder
into-CFG-layout performs?

Richard.

> Regards
> Senthil
> 
> RTL before and after bbro.
> 
> Before:
> 
> (jump_insn 24 6 25 2 (set (pc)
>         (label_ref 15)) "pr30908.c":3 678 {jump}
>      (nil)
>  -> 15)
> (barrier 25 24 17)
> (code_label 17 25 12 3 3 "" [1 uses])
> (note 12 17 13 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
> (insn 13 12 14 3 (parallel [
>             (asm_operands/v ("nop") ("") 0 []
>                  []
>                  [] pr30908.c:4)
>             (clobber (mem:BLK (scratch) [0  A8]))
>             (clobber (reg:CCFP 18 fpsr))
>             (clobber (reg:CC 17 flags))
>         ]) "pr30908.c":4 -1
>      (expr_list:REG_UNUSED (reg:CCFP 18 fpsr)
>         (expr_list:REG_UNUSED (reg:CC 17 flags)
>             (nil))))
> (insn 14 13 15 3 (parallel [
>             (set (reg:SI 5 di [orig:90 ivtmp.9 ] [90])
>                 (plus:SI (reg:SI 5 di [orig:90 ivtmp.9 ] [90])
>                     (const_int -1 [0xffffffffffffffff])))
>             (clobber (reg:CC 17 flags))
>         ]) 210 {*addsi_1}
>      (expr_list:REG_UNUSED (reg:CC 17 flags)
>         (nil)))
> (code_label 15 14 16 4 2 "" [1 uses])
> (note 16 15 18 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
> (insn 18 16 19 4 (set (reg:CCNO 17 flags)
>         (compare:CCNO (reg:SI 5 di [orig:90 ivtmp.9 ] [90])
>             (const_int 0 [0]))) "pr30908.c":3 3 {*cmpsi_ccno_1}
>      (nil))
> (jump_insn 19 18 30 4 (set (pc)
>         (if_then_else (gt (reg:CCNO 17 flags)
>                 (const_int 0 [0]))
>             (label_ref 17)
>             (pc))) "pr30908.c":3 646 {*jcc_1}
>      (expr_list:REG_DEAD (reg:CCNO 17 flags)
>         (int_list:REG_BR_PROB 8500 (nil)))
>  -> 17)
> (note 30 19 28 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
> (note 28 30 29 5 NOTE_INSN_EPILOGUE_BEG)
> (jump_insn 29 28 31 5 (simple_return) "pr30908.c":5 708 
> {simple_return_internal}
>      (nil)
>  -> simple_return)
> 
> After:
> 
> <snip>
> (code_label 15 6 16 3 2 "" [1 uses])
> (note 16 15 18 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
> (insn 18 16 19 3 (set (reg:CCNO 17 flags)
>         (compare:CCNO (reg:SI 5 di [orig:90 ivtmp.9 ] [90])
>             (const_int 0 [0]))) "pr30908.c":3 3 {*cmpsi_ccno_1}
>      (nil))
> (jump_insn 19 18 12 3 (set (pc)
>         (if_then_else (le (reg:CCNO 17 flags)
>                 (const_int 0 [0]))
>             (label_ref:DI 34)
>             (pc))) "pr30908.c":3 646 {*jcc_1}
>      (expr_list:REG_DEAD (reg:CCNO 17 flags)
>         (int_list:REG_BR_PROB 1500 (nil)))
>  -> 34)
> (note 12 19 13 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
> (insn 13 12 14 4 (parallel [
>             (asm_operands/v ("nop") ("") 0 []
>                  []
>                  [] pr30908.c:4)
>             (clobber (mem:BLK (scratch) [0  A8]))
>             (clobber (reg:CCFP 18 fpsr))
>             (clobber (reg:CC 17 flags))
>         ]) "pr30908.c":4 -1
>      (expr_list:REG_UNUSED (reg:CCFP 18 fpsr)
>         (expr_list:REG_UNUSED (reg:CC 17 flags)
>             (nil))))
> (insn 14 13 35 4 (parallel [
>             (set (reg:SI 5 di [orig:90 ivtmp.9 ] [90])
>                 (plus:SI (reg:SI 5 di [orig:90 ivtmp.9 ] [90])
>                     (const_int -1 [0xffffffffffffffff])))
>             (clobber (reg:CC 17 flags))
>         ]) 210 {*addsi_1}
>      (expr_list:REG_UNUSED (reg:CC 17 flags)
>         (nil)))
> (jump_insn 35 14 36 4 (set (pc)
>         (label_ref 15)) -1
>      (nil)
>  -> 15)
> (barrier 36 35 34)
> (code_label 34 36 30 5 5 "" [1 uses])
> (note 30 34 28 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
> (note 28 30 29 5 NOTE_INSN_EPILOGUE_BEG)
> (jump_insn 29 28 31 5 (simple_return) "pr30908.c":5 708 
> {simple_return_internal}
>      (nil)
>  -> simple_return)
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Reply via email to