On Wed, 14 Apr 2021, Xionghu Luo wrote: > Hi, > > On 2021/3/26 15:35, Xionghu Luo via Gcc-patches wrote: > >> Also we already have a sinking pass on RTL which even computes > >> a proper PRE on the reverse graph - -fgcse-sm aka store-motion.c. > >> I'm not sure whether this deals with non-stores but the > >> LCM machinery definitely can handle arbitrary expressions. I wonder > >> if it makes more sense to extend this rather than inventing a new > >> ad-hoc sinking pass? > > From the literal, my pass doesn't handle or process store instructions > > like store-motion.. Thanks, will check it. > > Store motion only processes store instructions with data flow equations, > generating 4 inputs(st_kill, st_avloc, st_antloc, st_transp) and solve it > by Lazy Code Motion API(5 DF compute call) with 2 outputs (st_delete_map, > st_insert_map) globally, each store place is independently represented in > the input bitmap vectors. Output is which should be delete and where to > insert, current code does what you said "emit copies to a new pseudo at > the original insn location and use it in followed bb", actually it is > "store replacement" instead of "store move", why not save one pseudo by > moving the store instruction to target edge directly?
It probably simply saves the pass from doing analysis whether the stored value is clobbered on the sinking path, enabling more store sinking. For stores that might be even beneficial, for non-stores it becomes more of a cost issue, yes. > There are many differences between the newly added rtl-sink pass and > store-motion pass. > 1. Store motion moves only store instructions, rtl-sink ignores store > instructions. > 2. Store motion is a global DF problem solving, rtl-sink only processes > loop header reversely with dependency check in loop, take the below RTL > as example, > "#538,#235,#234,#233" will all be sunk from bb 35 to bb 37 by rtl-sink, > but it moves #538 first, then #235, there is strong dependency here. It > seemsdoesn't like the LCM framework that could solve all and do the > delete-insert in one iteration. So my question was whether we want to do both within the LCM store sinking framework. The LCM dataflow is also used by RTL PRE which handles both loads and non-loads so in principle it should be able to handle stores and non-stores for the sinking case (PRE on the reverse CFG). A global dataflow is more powerful than any local ad-hoc method. Richard. > However, there are still some common methods could be shared, like the > def-use check(though store-motion is per bb, rtl-sink is per loop), > insert_store, commit_edge_insertions etc. > > > 508: L508: > 507: NOTE_INSN_BASIC_BLOCK 34 > 12: r139:DI=r140:DI > REG_DEAD r140:DI > 240: L240: > 231: NOTE_INSN_BASIC_BLOCK 35 > 232: r142:DI=zero_extend(r139:DI#0) > 233: r371:SI=r142:DI#0-0x1 > 234: r243:DI=zero_extend(r371:SI) > REG_DEAD r371:SI > 235: r452:DI=r262:DI+r139:DI > 538: r194:DI=r452:DI > 236: r372:CCUNS=cmp(r142:DI#0,r254:DI#0) > 237: pc={(geu(r372:CCUNS,0))?L246:pc} > REG_DEAD r372:CCUNS > REG_BR_PROB 59055804 > 238: NOTE_INSN_BASIC_BLOCK 36 > 239: r140:DI=r139:DI+0x1 > 241: r373:DI=r251:DI-0x1 > 242: r374:SI=zero_extend([r262:DI+r139:DI]) > REG_DEAD r139:DI > 243: r375:SI=zero_extend([r373:DI+r140:DI]) > REG_DEAD r373:DI > 244: r376:CC=cmp(r374:SI,r375:SI) > REG_DEAD r375:SI > REG_DEAD r374:SI > 245: pc={(r376:CC==0)?L508:pc} > REG_DEAD r376:CC > REG_BR_PROB 1014686028 > 246: L246: > 247: NOTE_INSN_BASIC_BLOCK 37 > 248: r377:SI=r142:DI#0-0x2 > REG_DEAD r142:DI > 249: r256:DI=zero_extend(r377:SI) > > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)