on 2021/1/26 上午1:59, Richard Sandiford via Gcc-patches wrote: > Richard Biener <rguent...@suse.de> writes: >> On Fri, 22 Jan 2021, Segher Boessenkool wrote: >> >>> On Fri, Jan 22, 2021 at 02:47:06PM +0100, Richard Biener wrote: >>>> On Thu, 21 Jan 2021, Segher Boessenkool wrote: >>>>> What is holding up this patch still? Ke Wen has pinged it every month >>>>> since May, and there has still not been a review. >>> >>> Richard Sandiford wrote: >>>> FAOD (since I'm on cc:), I don't feel qualified to review this. >>>> Tree-level loop stuff isn't really my area. >>> >>> And Richard Biener wrote: >>>> I don't like it, it feels wrong but I don't have a good suggestion >>>> that had positive feedback. Since a reviewer / approver is indirectly >>>> responsible for at least the design I do not want to ack this patch. >>>> Bin made forward progress on the other parts of the series but clearly >>>> there's somebody missing with the appropriate privileges who feels >>>> positive about the patch and its general direction. >>>> >>>> Sorry to be of no help here. >>> >>> How unfortunate :-( >>> >>> So, first off, this will then have to work for next stage 1 to make any >>> progress. Rats. >>> >>> But what could have been done differently that would have helped? Of >>> course Ke Wen could have written a better patch (aka one that is more >>> acceptable); either of you could have made your current replies earlier, >>> so that it is clear help needs to be sought elsewhere; and I could have >>> pushed people earlier, too. No one really did anything wrong, I'm not >>> seeking who to blame, I'm just trying to find out how to prevent >>> deadlocks like this in the future (where one party waits for replies >>> that will never come). >>> >>> Is it just that we have a big gaping hole in reviewers with experience >>> in such loop optimisations? >> >> May be. But what I think is the biggest problem is that we do not >> have a good way to achieve what the patch tries (if you review the >> communications you'll see many ideas tossed around) first and foremost >> because IV selection is happening early on GIMPLE and unrolling >> happens late on RTL. Both need a quite accurate estimate of costs >> but unrolling has an ever harder time than IV selection where we've >> got along with throwing dummy RTL at costing functions. >> >> IMHO the patch is the wrong "start" to try fixing the issue and my >> fear is that wiring this kind of "features" into the current >> (fundamentally broken) state will make it much harder to rework >> that state without introducing regressions on said features (I'm >> there with trying to turn the vectorizer upside down - for three >> years now, struggling to not regress any of the "features" we've >> accumulated for various targets where most of them feel a >> "bolted-on" rather than well-designed ;/). > > Thinking of any features in particular here? > > Most of the ones I can think of seem to be doing things in the way > that the current infrastructure expects. But of course, the current > infrastructure isn't perfect, so the end result isn't either. > > Still, I agree with the above apart from maybe that last bit. ;-) > >> I think IV selection and unrolling (and scheduling FWIW) need to move >> closer together. I do not have a good idea how that can work out >> though but I very much believe that this "most wanted" GIMPLE unroller >> will not be a good way of progressing here. > > What do you feel about unrolling in the vectoriser (by doubling the VF, etc.) > in cases where something about the target indicates that that would be > useful? I think that's a good place to do it (for the cases that it > handles) because it's hard to unroll later and then interleave. > >> Maybe taking the bullet and moving IV selection back to RTL is the >> answer. > > I think that would be a bad move. The trend recently seems to have been > to lower stuff to individual machine operations earlier in the rtl pass > pipeline (often immediately during expand) rather than split them later. > The reasoning behind that is that (1) gimple has already heavily optimised > the unlowered form and (2) lowering earlier gives the more powerful rtl > optimisers a chance to do something with the individual machine operations. > It's going to be hard for an RTL ivopts pass to piece everything back > together. > >> For a "short term" solution I still think that trying to perform >> unrolling and IV selection (for the D-form case you're targeting) >> at the same time is a better design, even if it means complicating >> the IV selection pass (and yeah, it'll still be at GIMPLE and w/o >> any good idea about scheduling). There are currently 20+ GIMPLE >> optimization passes and 10+ RTL optimization passes between >> IV selection and unrolling, the idea that you can have transform >> decision and transform apply this far apart looks scary. > > FWIW, another option might be to go back to something like: > > https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532676.html > > I agree that it was worth putting that series on hold and trying a more > target-independent approach, but I think in the end it didn't work out, > for the reasons Richard says. At least the target-specific pass would > be making a strict improvement to the IL that it sees, rather than > having to predict what future passes might do or might want. >
Yeah, I also had this thought in mind, if we cannot find one good target-independent approach, it seems good to revisit this series and work with target-specific approach. BR, Kewen