> Am 14.09.2018 um 23:35 schrieb Segher Boessenkool > <seg...@kernel.crashing.org>: > > Hi! > > On Mon, Sep 10, 2018 at 12:59:19PM +0200, Ilya Leoshkevich wrote: >> Consider the following RTL: >> >> (code_label 11 10 26 4 2 (nil) [1 uses]) >> (note 26 11 12 4 [bb 4] NOTE_INSN_BASIC_BLOCK) >> (insn 12 26 15 4 (set (reg:SI 65) >> (if_then_else:SI (eq (reg:CCZ 33 %cc) >> (const_int 0 [0])) >> (const_int 1 [0x1]) >> (const_int 0 [0]))) "pr80080-4.c":9 1674 {*movsicc}) >> (insn 15 12 16 4 (parallel [ >> (set (reg:CCZ 33 %cc) >> (compare:CCZ (reg:SI 65) >> (const_int 0 [0]))) >> (clobber (scratch:SI)) >> ]) "pr80080-4.c":9 1216 {*tstsi_cconly_extimm}) >> (jump_insn 16 15 17 4 (set (pc) >> (if_then_else (ne (reg:CCZ 33 %cc) >> (const_int 0 [0])) >> (label_ref:DI 23) >> (pc))) "pr80080-4.c":9 1897 {*cjump_64}) >> >> Combine simplifies this into: >> >> (code_label 11 10 26 4 2 (nil) [1 uses]) >> (note 26 11 12 4 [bb 4] NOTE_INSN_BASIC_BLOCK) >> (note 12 26 15 4 NOTE_INSN_DELETED) >> (note 15 12 16 4 NOTE_INSN_DELETED) >> (jump_insn 16 15 17 4 (set (pc) >> (if_then_else (eq (reg:CCZ 33 %cc) >> (const_int 0 [0])) >> (label_ref:DI 23) >> (pc))) "pr80080-4.c":9 1897 {*cjump_64}) >> >> opening up the possibility to perform jump threading. Since this >> happens infrequently, perform jump threading only when there is a >> changed basic block, whose sole side effect is a trailing jump. > > So this happens because now there is *only* a conditional jump in this BB?
Yes. > Could you please show generated code before and after this patch? > I mean generated assembler code. What -S gives you. Before: foo4: .LFB0: .cfi_startproc lt %r1,0(%r2) jne .L2 lhi %r3,1 cs %r1,%r3,0(%r2) .L2: jne .L5 br %r14 .L5: jg bar After: foo4: .LFB0: .cfi_startproc lt %r1,0(%r2) jne .L4 lhi %r3,1 cs %r1,%r3,0(%r2) jne .L4 br %r14 .L4: jg bar >> +/* Return true iff the only side effect of BB is its trailing jump_insn. */ >> + >> +static bool >> +is_single_jump_bb (basic_block bb) >> +{ >> + rtx_insn *end = BB_END (bb); >> + rtx_insn *insn; >> + >> + if (!JUMP_P (end)) >> + return false; >> + >> + for (insn = BB_HEAD (bb); insn != end; insn = NEXT_INSN (insn)) >> + if (INSN_P (insn) && side_effects_p (PATTERN (insn))) >> + return false; >> + return true; >> +} > > Hrm, so it is more than that. The intention of this check is to match the „Short circuit cases where block B contains some side effects, as we can't safely bypass it“ one in thread_jump (). I've just noticed that I got it wrong - I should also check whether JUMP_INSN itself has side-effects, like it's done there. > Why does the existing jump threading not work for you; should it happen > at another time? We call cleanup_cfg (CLEANUP_THREADING) only once - during the „jump“ pass, which happens before combine. There is also „jump2“ pass, which happens afterwards, and after discussion with Ulrich Weigand I tried to move jump threading there. While this change had the desired effect on the testcase, the code got worse in another places. Example from GemsFDTD: Before: 103e96c: b9 04 00 31 lgr %r3,%r1 103e970: a7 18 00 00 lhi %r1,0 103e974: e3 20 f0 d0 00 0d dsg %r2,208(%r15) 103e97a: b9 20 00 c3 cgr %r12,%r3 103e97e: a7 29 ff ff lghi %r2,-1 103e982: ec 12 00 01 00 42 lochih %r1,1 103e988: e3 20 f0 f8 00 82 xg %r2,248(%r15) 103e98e: 1a 15 ar %r1,%r5 103e990: b9 e9 c0 72 sgrk %r7,%r2,%r12 103e994: b3 c1 00 e2 ldgr %f14,%r2 103e998: b9 f8 a0 21 ark %r2,%r1,%r10 103e99c: 12 99 ltr %r9,%r9 103e99e: a7 18 00 00 lhi %r1,0 103e9a2: ec 1c 00 01 00 42 lochile %r1,1 103e9a8: 50 10 f1 28 st %r1,296(%r15) 103e9ac: c2 9d 00 00 00 00 cfi %r9,0 103e9b2: c0 c4 00 01 28 4e jgle 1063a4e 103e9b8: e5 4c f1 08 00 00 mvhi 264(%r15),0 103e9be: eb 14 00 03 00 0d sllg %r1,%r4,3 After: 103e96c: b9 04 00 31 lgr %r3,%r1 103e970: e5 4c f1 18 00 01 mvhi 280(%r15),1 103e976: a7 18 00 00 lhi %r1,0 103e97a: e3 20 f0 d0 00 0d dsg %r2,208(%r15) 103e980: b9 20 00 c3 cgr %r12,%r3 103e984: a7 29 ff ff lghi %r2,-1 103e988: ec 12 00 01 00 42 lochih %r1,1 103e98e: e3 20 f0 f8 00 82 xg %r2,248(%r15) 103e994: 1a 15 ar %r1,%r5 103e996: b3 c1 00 e2 ldgr %f14,%r2 103e99a: b9 e9 c0 72 sgrk %r7,%r2,%r12 103e99e: b9 f8 a0 21 ark %r2,%r1,%r10 103e9a2: c2 9d 00 00 00 00 cfi %r9,0 103e9a8: c0 c4 00 01 28 51 jgle 1063a4a 103e9ae: e5 4c f1 18 00 00 mvhi 280(%r15),0 103e9b4: e5 4c f1 08 00 00 mvhi 264(%r15),0 103e9ba: eb 14 00 03 00 0d sllg %r1,%r4,3 Diff: lgr %rn,%rn - mvhi m(%rx),1 lhi %rn,0 ar %rn,%rn - ldgr %fn,%rn sgrk %rn,%rn,%rn + ldgr %fn,%rn ark %rn,%rn,%rn + ltr %rn,%rn + lhi %rn,0 + lochile %rn,1 + st %rn,m(%rx) cfi %rn,0 mvhi m(%rx),0 - mvhi m(%rx),0 sllg %rn,%rn,3 The code used to compare %r9 with 0, jump based on the comparison result, and set the local variable in the branch. Now it compares %r9 with 0 twice - I didn’t look at the details yet, but apparently we’re missing some optimization because jump threading is done too late. > > Segher