On Tue, 7 Jan 2020, Kewen.Lin wrote: > on 2020/1/7 下午5:14, Richard Biener wrote: > > On Mon, 6 Jan 2020, Kewen.Lin wrote: > > > >> We are thinking whether it can be handled in IVOPTs instead of one RTL > >> pass. > >> > >> During IVOPTs selecting IV cands, it doesn't know the loop will be > >> unrolled so > >> it doesn't count the possible step cost in with X-form. If we can teach > >> it to > >> consider the case, the IV cands which plays with D-form can be preferred. > >> Currently unrolling (incomplete) happens in RTL, it looks we have to > >> predict > >> the loop whether unroll in IVOPTs. Since there is some parameter checks > >> on RTL > >> insn counts and target hooks, it seems not easy to get that. Besides, we > >> need > >> to check the step is valid to put into D-form field (eg: DQ-form requires > >> divide > >> 16 exactly), to ensure no extra ADDIs needed. > >> > >> I'm not sure whether it's a good idea to implement in IVOPTs, but I did > >> some > >> changes in IVOPTs to prove it's doable to get expected codes, the patch is > >> attached. > >> > >> Any comments/suggestions are highly appreiciated! > > > > Is the unrolled code better than the not unrolled code (assuming > > optimal IV choice)? Then IMHO IVOPTs should drive the unrolling, > > either by actually doing it or by forcing it via the loop->unroll > > setting. I don't think second-guessing the RTL unroller at this > > point is going to work. Alternatively turn X-form into D-form during > > RTL unrolling? > > > > Hi Richard, > > Thanks for the comments! > > Yes, unrolled version is better on Power9 for both forms, but D-form > unrolled is better than X-form unrolled. If we drive unrolling in > IVOPTs, not sure it will be a concern that IVOPTs becomes too heavy? or > too rude with forced UF if imprecise? Do we still have the plan to > introduce one middle-end unroll pass, does it help if yes?
I have the opinion that an isolated unrolling pass is not wanted. Instead unrolling should be driven by some profitability metric which in your case is better induction variable optimization. In the "usual" case it is better scheduling where then scheduling should drive unrolling. > The quoted > RTL patch is to propose one RTL pass after RTL loop passes, it also > sounds good to check whether RTL unrolling is a good place! Why would you need a new RTL pass? I'd do it during the unroll transform itself, ideally on the not unrolled body because that's likely simpler than updating N copies? Richard.