On Thu, Apr 23, 2020 at 2:52 PM Segher Boessenkool <seg...@kernel.crashing.org> wrote: > > On Thu, Apr 23, 2020 at 02:25:40PM +0200, Richard Biener wrote: > > > > But being stuck with something means no progress... I know > > > > very well it's 100 times harder to get rid of something than to > > > > add something new ontop. > > > > > > Well, what progress do you expect to make? After expand that is :-) > > > > I'd like the RTL pipeline before RA to shrink significantly, no PRE, > > no CSE, ... > > RTL CSE for example is very much required to get any good code. It > needs to CSE stuff that wasn't there before expand.
Sure, but then we should fix that! > The pass currently does much more (as well as not enough), of course. > > > The important part before RA is robust and intelligent > > instruction selection - part of which is already done at RTL expansion > > time. > > LOL. > > The expand pass doesn't often make good choices, and it *shouldn't*, it > should not make many choices at all; it should just generate valid RTL, > new pseudos for everything, and let later RTL passes make faster code > from that. But valid RTL is instructions that are recognized. Which means when the target doesn't support an SImode add we may not create one. That's instruction selection ;) > > > Most of what is done in RTL is done very well. > > > > Umm, well... I beg to differ with regard to DF and passes like > > postreload-gcse. > > What is wrong with DF? It's slow and memory hungry? > Is there something particular in postreload-gcse that is bad? To me it > always is just one of those passes that doesn't do anything :-) That > can and should be cleaned up, sure :-) postreload-gcse is ad-hoc, it uses full blown gcse tools that easily blow up (compute_transp) when it doesn't really require it (Ive fixed things up a bit in dc91c65378cd0e6c0). But I wonder why, if we want to do PRE of loads, we don't simply schedule another gcse pass rather than implementing a new one. IIRC what the pass does could be done with much more local dataflow. Both postreload gcse and cse are major time-hogs on "bad" testcases :/ > > > Replacing specific things in RTL, maybe as big as whole passes, already > > > is hard to do without regressing (a *lot*). And if there is no real > > > reason to do that... > > > > The motivation is to make GCC faster, obviously. Not spending time > > doing things that should have been done before (RTL PRE vs. GIMPLE PRE, > > etc.). > > Using the same infrastructure (what, no loop dependency analysis on RTL?), > > etc. > > But everything you want to remove isn't high on profiles anyway? And > you proposed adding bigger, slower, stuff to replace it all with. > > Slow are RA, and the language frontends (esp. C++ like to take 15% of > total :-/) > > > You could say we should do more on RTL and enhance it instead, like do > > vectorization where we actually could have a better idea on costs and > > capabilities. But I'm a GIMPLE person and don't know enough of RTL to > > enhance it ... > > Oh no, I think we should do more earlier, and GIMPLE is a fine IR for > there. But for low-level, close-to-the-machine stuff, RTL is much > better suited. And we *do* want to optimise at that level as well, and > much more than just peepholes. Well, everything that requires costing (unrolling, vectorization, IV selection to name a few) _is_ close-to-the-machine. We're just saying they are not because GIMPLE is so much easier to work with here (not sure why exactly...). Richard. > > Segher