On Tue, 26 Jan 2021, Kewen.Lin wrote: > Hi Segher/Richard B./Richard S., > > Many thanks for your all helps and comments on this! > > on 2021/1/25 下午3:56, Richard Biener wrote: > > On Fri, 22 Jan 2021, Segher Boessenkool wrote: > > > >> On Fri, Jan 22, 2021 at 02:47:06PM +0100, Richard Biener wrote: > >>> On Thu, 21 Jan 2021, Segher Boessenkool wrote: > >>>> What is holding up this patch still? Ke Wen has pinged it every month > >>>> since May, and there has still not been a review. > >> > >> Richard Sandiford wrote: > >>> FAOD (since I'm on cc:), I don't feel qualified to review this. > >>> Tree-level loop stuff isn't really my area. > >> > >> And Richard Biener wrote: > >>> I don't like it, it feels wrong but I don't have a good suggestion > >>> that had positive feedback. Since a reviewer / approver is indirectly > >>> responsible for at least the design I do not want to ack this patch. > >>> Bin made forward progress on the other parts of the series but clearly > >>> there's somebody missing with the appropriate privileges who feels > >>> positive about the patch and its general direction. > >>> > >>> Sorry to be of no help here. > >> > >> How unfortunate :-( > >> > >> So, first off, this will then have to work for next stage 1 to make any > >> progress. Rats. > >> > >> But what could have been done differently that would have helped? Of > >> course Ke Wen could have written a better patch (aka one that is more > >> acceptable); either of you could have made your current replies earlier, > >> so that it is clear help needs to be sought elsewhere; and I could have > >> pushed people earlier, too. No one really did anything wrong, I'm not > >> seeking who to blame, I'm just trying to find out how to prevent > >> deadlocks like this in the future (where one party waits for replies > >> that will never come). > >> > >> Is it just that we have a big gaping hole in reviewers with experience > >> in such loop optimisations? > > > > May be. But what I think is the biggest problem is that we do not > > have a good way to achieve what the patch tries (if you review the > > communications you'll see many ideas tossed around) first and foremost > > because IV selection is happening early on GIMPLE and unrolling > > happens late on RTL. Both need a quite accurate estimate of costs > > but unrolling has an ever harder time than IV selection where we've > > got along with throwing dummy RTL at costing functions. > > > > Yeah, exactly. > > > IMHO the patch is the wrong "start" to try fixing the issue and my > > fear is that wiring this kind of "features" into the current > > (fundamentally broken) state will make it much harder to rework > > that state without introducing regressions on said features (I'm > > there with trying to turn the vectorizer upside down - for three > > years now, struggling to not regress any of the "features" we've > > accumulated for various targets where most of them feel a > > "bolted-on" rather than well-designed ;/). > > > > OK, understandable. > > > I think IV selection and unrolling (and scheduling FWIW) need to move > > closer together. I do not have a good idea how that can work out > > though but I very much believe that this "most wanted" GIMPLE unroller > > will not be a good way of progressing here. Maybe taking the bullet > > and moving IV selection back to RTL is the answer. > > > > I haven't looked into loop-iv.c, but IVOPTS in gimple can leverage > SCEV analysis for iv detection, if moving it to RTL, it could be > very heavier to detect the full set there? > > > For a "short term" solution I still think that trying to perform > > unrolling and IV selection (for the D-form case you're targeting) > > at the same time is a better design, even if it means complicating > > the IV selection pass (and yeah, it'll still be at GIMPLE and w/o > > any good idea about scheduling). There are currently 20+ GIMPLE > > optimization passes and 10+ RTL optimization passes between > > IV selection and unrolling, the idea that you can have transform > > decision and transform apply this far apart looks scary. > > > > I have some questions in mind for this part, for "perform unrolling > and IV selection at the same time", it can be interpreted to two > different implementation ways to me: > > 1) Run one gimple unrolling pass just before IVOPTS, probably using > the same gate for IVOPTS. The unrolling factor is computed by > the same method as that of RTL unrolling. But this sounds very > like "most wanted gimple unrolling" which is what we want to avoid. > > The positive aspect here is what IVOPTS faces is already one unrolled > loop, it can see the whole picture and get the optimal IV set. The > downside/question is how we position these gimple unrolling and RTL > unrolling passes, whether we still need RTL unrolling. If no, it's > doubtable that one gimple unrolling can fully replace the RTL > unrolling probably lacking some actual target information/instructions. > If yes, it's still possible to have inconsistent unrolling factors > between what IVOPTS optimizes on and what late RTL unrolling pass > ends with. > > 2) Make IVOPTS determine the unrolling factor by considering the > reg-offset addressing (D-form), unroll the loop and do the remainings.
Yes, that's what I meant. > I don't think you referred to this though. Since comparing to > reg-reg addressing, reg-offset addressing can only save some bumps, > it's too weak to be one deciding factor of unrolling factor. Unlike > vectorizer or modulo scheduling, it's more likely there are more > important factors for unrolling factor computation than this one. But you _are_ using that bump cost to compute the unroll estimate (and IIRC you are doing the D-form IV selection then). > For IVOPTS, it's more like that it doesn't care what the unrolling > factor should be, but it needs to know what the unrolling factor > would be, then do optimal IV selection based on that. So it's not > good to get it to decide the unrolling factor. Well, currently with the stupid RTL unroller you will know it will be unrolled 8 times unless it is cold or doesn't iterate enough. There's not much costing going on in the RTL unroller (unless you go wild with target hooks). Richard.