On Mon, Sep 17, 2018 at 10:50:58AM +0200, Ilya Leoshkevich wrote:
> > Am 14.09.2018 um 23:35 schrieb Segher Boessenkool 
> > <seg...@kernel.crashing.org>:
> > Could you please show generated code before and after this patch?
> > I mean generated assembler code.  What -S gives you.
> 
> Before:
> 
> foo4:
> .LFB0:
>       .cfi_startproc
>       lt      %r1,0(%r2)
>       jne     .L2
>       lhi     %r3,1
>       cs      %r1,%r3,0(%r2)
> .L2:
>       jne     .L5
>       br      %r14
> .L5:
>       jg      bar
> 
> After:
> 
> foo4:
> .LFB0:
>       .cfi_startproc
>       lt      %r1,0(%r2)
>       jne     .L4
>       lhi     %r3,1
>       cs      %r1,%r3,0(%r2)
>       jne     .L4
>       br      %r14
> .L4:
>       jg      bar

Ah.  And a compiler of some weeks old gives

foo4:
.LFB0:
        .cfi_startproc
        lhi     %r3,0
        lhi     %r4,1
        cs      %r3,%r4,0(%r2)
        jne     .L4
        br      %r14
.L4:
        jg      bar

so this is all caused by the recent optimisation that avoids the "cs" if
it can.

> > Why does the existing jump threading not work for you; should it happen
> > at another time?
> 
> We call cleanup_cfg (CLEANUP_THREADING) only once - during the „jump“
> pass, which happens before combine.  There is also „jump2“ pass, which
> happens afterwards, and after discussion with Ulrich Weigand I tried to
> move jump threading there.  While this change had the desired effect on
> the testcase, the code got worse in another places. Example from
> GemsFDTD:
> 
> Before:
> 
> 103e96c:       b9 04 00 31             lgr     %r3,%r1
> 103e970:       a7 18 00 00             lhi     %r1,0
> 103e974:       e3 20 f0 d0 00 0d       dsg     %r2,208(%r15)
> 103e97a:       b9 20 00 c3             cgr     %r12,%r3
> 103e97e:       a7 29 ff ff             lghi    %r2,-1
> 103e982:       ec 12 00 01 00 42       lochih  %r1,1
> 103e988:       e3 20 f0 f8 00 82       xg      %r2,248(%r15)
> 103e98e:       1a 15                   ar      %r1,%r5
> 103e990:       b9 e9 c0 72             sgrk    %r7,%r2,%r12
> 103e994:       b3 c1 00 e2             ldgr    %f14,%r2
> 103e998:       b9 f8 a0 21             ark     %r2,%r1,%r10
> 103e99c:       12 99                   ltr     %r9,%r9
> 103e99e:       a7 18 00 00             lhi     %r1,0
> 103e9a2:       ec 1c 00 01 00 42       lochile %r1,1
> 103e9a8:       50 10 f1 28             st      %r1,296(%r15)
> 103e9ac:       c2 9d 00 00 00 00       cfi     %r9,0
> 103e9b2:       c0 c4 00 01 28 4e       jgle    1063a4e
> 103e9b8:       e5 4c f1 08 00 00       mvhi    264(%r15),0
> 103e9be:       eb 14 00 03 00 0d       sllg    %r1,%r4,3
> 
> After:
> 
> 103e96c:       b9 04 00 31             lgr     %r3,%r1       
> 103e970:       e5 4c f1 18 00 01       mvhi    280(%r15),1
> 103e976:       a7 18 00 00             lhi     %r1,0         
> 103e97a:       e3 20 f0 d0 00 0d       dsg     %r2,208(%r15) 
> 103e980:       b9 20 00 c3             cgr     %r12,%r3      
> 103e984:       a7 29 ff ff             lghi    %r2,-1        
> 103e988:       ec 12 00 01 00 42       lochih  %r1,1         
> 103e98e:       e3 20 f0 f8 00 82       xg      %r2,248(%r15) 
> 103e994:       1a 15                   ar      %r1,%r5       
> 103e996:       b3 c1 00 e2             ldgr    %f14,%r2
> 103e99a:       b9 e9 c0 72             sgrk    %r7,%r2,%r12  
> 103e99e:       b9 f8 a0 21             ark     %r2,%r1,%r10  
> 103e9a2:       c2 9d 00 00 00 00       cfi     %r9,0         
> 103e9a8:       c0 c4 00 01 28 51       jgle    1063a4a
> 103e9ae:       e5 4c f1 18 00 00       mvhi    280(%r15),0
> 103e9b4:       e5 4c f1 08 00 00       mvhi    264(%r15),0   
> 103e9ba:       eb 14 00 03 00 0d       sllg    %r1,%r4,3     
> 
> Diff:
> 
>                 lgr     %rn,%rn
> -               mvhi    m(%rx),1
>                 lhi     %rn,0
> 
>                 ar      %rn,%rn
> -               ldgr    %fn,%rn
>                 sgrk    %rn,%rn,%rn
> +               ldgr    %fn,%rn
>                 ark     %rn,%rn,%rn
> +               ltr     %rn,%rn
> +               lhi     %rn,0
> +               lochile %rn,1
> +               st      %rn,m(%rx)
>                 cfi     %rn,0
> 
>                 mvhi    m(%rx),0
> -               mvhi    m(%rx),0
>                 sllg    %rn,%rn,3
> 
> The code used to compare %r9 with 0, jump based on the comparison
> result, and set the local variable in the branch.  Now it compares %r9
> with 0 twice - I didn’t look at the details yet, but apparently we’re
> missing some optimization because jump threading is done too late.

Yeah, jump2 is quite late.  I think you should do it before postreload_cse
or similar.

Btw, what percentage of files (or subroutines) does jump threading run
after combine, with your patch?


Segher

Reply via email to