On Mon, 25 Jan 2021, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > On Fri, 22 Jan 2021, Segher Boessenkool wrote: > > > >> On Fri, Jan 22, 2021 at 02:47:06PM +0100, Richard Biener wrote: > >> > On Thu, 21 Jan 2021, Segher Boessenkool wrote: > >> > > What is holding up this patch still? Ke Wen has pinged it every month > >> > > since May, and there has still not been a review. > >> > >> Richard Sandiford wrote: > >> > FAOD (since I'm on cc:), I don't feel qualified to review this. > >> > Tree-level loop stuff isn't really my area. > >> > >> And Richard Biener wrote: > >> > I don't like it, it feels wrong but I don't have a good suggestion > >> > that had positive feedback. Since a reviewer / approver is indirectly > >> > responsible for at least the design I do not want to ack this patch. > >> > Bin made forward progress on the other parts of the series but clearly > >> > there's somebody missing with the appropriate privileges who feels > >> > positive about the patch and its general direction. > >> > > >> > Sorry to be of no help here. > >> > >> How unfortunate :-( > >> > >> So, first off, this will then have to work for next stage 1 to make any > >> progress. Rats. > >> > >> But what could have been done differently that would have helped? Of > >> course Ke Wen could have written a better patch (aka one that is more > >> acceptable); either of you could have made your current replies earlier, > >> so that it is clear help needs to be sought elsewhere; and I could have > >> pushed people earlier, too. No one really did anything wrong, I'm not > >> seeking who to blame, I'm just trying to find out how to prevent > >> deadlocks like this in the future (where one party waits for replies > >> that will never come). > >> > >> Is it just that we have a big gaping hole in reviewers with experience > >> in such loop optimisations? > > > > May be. But what I think is the biggest problem is that we do not > > have a good way to achieve what the patch tries (if you review the > > communications you'll see many ideas tossed around) first and foremost > > because IV selection is happening early on GIMPLE and unrolling > > happens late on RTL. Both need a quite accurate estimate of costs > > but unrolling has an ever harder time than IV selection where we've > > got along with throwing dummy RTL at costing functions. > > > > IMHO the patch is the wrong "start" to try fixing the issue and my > > fear is that wiring this kind of "features" into the current > > (fundamentally broken) state will make it much harder to rework > > that state without introducing regressions on said features (I'm > > there with trying to turn the vectorizer upside down - for three > > years now, struggling to not regress any of the "features" we've > > accumulated for various targets where most of them feel a > > "bolted-on" rather than well-designed ;/). > > Thinking of any features in particular here?
Mostly all of the special-cases in load/store vectorization, but then... > Most of the ones I can think of seem to be doing things in the way > that the current infrastructure expects. But of course, the current > infrastructure isn't perfect, so the end result isn't either. ... indeed most of the issues are because of design decisions very early in the vectorizers lifetime. > Still, I agree with the above apart from maybe that last bit. ;-) > > > I think IV selection and unrolling (and scheduling FWIW) need to move > > closer together. I do not have a good idea how that can work out > > though but I very much believe that this "most wanted" GIMPLE unroller > > will not be a good way of progressing here. > > What do you feel about unrolling in the vectoriser (by doubling the VF, etc.) > in cases where something about the target indicates that that would be > useful? I think that's a good place to do it (for the cases that it > handles) because it's hard to unroll later and then interleave. I think that's fine (similar to unrolling during IV selection). Unrolling if there's a reason (other than unrolling) is good. Of course some costing has to be applied which might be difficult on GIMPLE (register pressure, CPU resource allocation aka scheduling). > > Maybe taking the bullet and moving IV selection back to RTL is the > > answer. > > I think that would be a bad move. The trend recently seems to have been > to lower stuff to individual machine operations earlier in the rtl pass > pipeline (often immediately during expand) rather than split them later. > The reasoning behind that is that (1) gimple has already heavily optimised > the unlowered form and (2) lowering earlier gives the more powerful rtl > optimisers a chance to do something with the individual machine operations. > It's going to be hard for an RTL ivopts pass to piece everything back > together. Hmm, OK. But of course individual machine instructions is that determine the cost ... Anyway, I think the GIMPLE -> RTL transition currently is a too big step and we should eventually try to lower GIMPLE and IV selection should happen on a lower form where we for example can do the 1st level scheduling on or at least have a better idea on resource allocation and latencies in dependence cycles since that's what really matters for unrolling. > > For a "short term" solution I still think that trying to perform > > unrolling and IV selection (for the D-form case you're targeting) > > at the same time is a better design, even if it means complicating > > the IV selection pass (and yeah, it'll still be at GIMPLE and w/o > > any good idea about scheduling). There are currently 20+ GIMPLE > > optimization passes and 10+ RTL optimization passes between > > IV selection and unrolling, the idea that you can have transform > > decision and transform apply this far apart looks scary. > > FWIW, another option might be to go back to something like: > > https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532676.html > > I agree that it was worth putting that series on hold and trying a more > target-independent approach, but I think in the end it didn't work out, > for the reasons Richard says. At least the target-specific pass would > be making a strict improvement to the IL that it sees, rather than > having to predict what future passes might do or might want. > > Thanks, > Richard > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)