Hi, sorry for late response, On 2021/3/23 16:50, Richard Biener wrote: >>> It definitely should be before uncprop (but context stops there). And yes, >>> re-running passes isn't the very, very best thing to do without explaining >>> it cannot be done in other ways. Not for late stage 3 anyway. >>> >>> Richard. >>> >> Thanks. Also tried to implement this in a seperate RTL pass, which >> would be better? I guess this would also be stage1 issues... > Yes, that's also for stage1 obviously. Can you check how the number > of sink opportunities of your RTL pass changes if you add the > 2nd GIMPLE sinking pass?
Number of Instructions sank out of loop when running (no sink2 -> sink2): 1. SPEC2017 int: 371 -> 142 2. SPEC2017 float: 949 -> 343 3. bootstrap: 402 -> 229 4. stage1 libraries: 115 -> 68 5. regression tests: 4533 -> 2948 the case I used from the beginning could all be optimized by gimple sink2 instead, but there are still many instructions sunk even gimple sink2 is added, I guess most of them are produced by expand pass. One example(It was after #38 in 262r.reginfo, note that new block is created between exit->src and exit->dst to avoid other bb jumps into exit->dst cause execution error due to r132:DI updated unexpectedly.), sometimes extra zero extend in loop like this could cause serious performance issue: vect-live-4.ltrans0.ltrans.263r.sink: ... Loop 2: sinking (set (reg/v:DI 132 [ <retval> ]) (sign_extend:DI (reg:SI 144))) from bb 7 to bb 11 ... 44: L44: 35: NOTE_INSN_BASIC_BLOCK 7 36: r127:DI=r127:DI+0x4 37: r145:SI=[r127:DI] 38: r144:SI=r145:SI+0x5 REG_DEAD r145:SI 40: r125:DI=r125:DI+0x4 41: [r125:DI]=r144:SI REG_DEAD r144:SI 42: r146:SI=r126:DI#0-0x1 REG_DEAD r126:DI 43: r126:DI=zero_extend(r146:SI) REG_DEAD r146:SI 45: r147:CC=cmp(r126:DI,0) 46: pc={(r147:CC!=0)?L65:pc} REG_DEAD r147:CC REG_BR_PROB 941032164 68: NOTE_INSN_BASIC_BLOCK 11 69: r132:DI=sign_extend(r144:SI) ; pc falls through to BB 8 65: L65: 64: NOTE_INSN_BASIC_BLOCK 9 ; pc falls through to BB 7 47: L47: 48: NOTE_INSN_BASIC_BLOCK 8 > > I'll note that you are not sinking later stmts first (you only > walk insns reverse but not BBs). GIMPLE sinking performs a > domwalk over post dominators (but it has an easier job because > of PHIs). I guess you'd want to walk starting from loop exit > blocks (from innermost loops as you do) in reverse program order. Yes, this rtl sink could only sink instruction from *loop header* to every loop exit blocks in reverse order, it is a bit strict since this is the first step to see whether it is reasonable to add such a pass. For example, if the instruction is in loop body block, sink it out will cause execution error sometimes as it doesn't have information in it whether the loop body block will be executed or not, if the loop jumps from header to exit directly, the instructions sunk from body to exit will change register value unexpected, seems always_reached and always_executed in loop-invarinat.c could be reused here to determine such circumstance? > > I'll also note that you don't have to check whether stmts you > want to sink have their uses set after it - you can emit copies > to a new pseudo at the original insn location and use those > after the loop (that of course comes at some cost). Not sure whether I understood your point correctly, but the instruction is still in loop executed loop niter times? What I am trying to do is move r132:DI=sign_extend(r144:SI) out of loop, if it was executed in loop 100 times, r132 is not used in loop, and r144 is not updated after current instruction, then move it to loop exit to execute one once. > > Also we already have a sinking pass on RTL which even computes > a proper PRE on the reverse graph - -fgcse-sm aka store-motion.c. > I'm not sure whether this deals with non-stores but the > LCM machinery definitely can handle arbitrary expressions. I wonder > if it makes more sense to extend this rather than inventing a new > ad-hoc sinking pass? >From the literal, my pass doesn't handle or process store instructions like store-motion.. Thanks, will check it. -- Thanks, Xionghu