On 10/18/22 15:51, Vineet Gupta wrote:



Where BB4 corresponds to .L2 and BB6 corresponds to .L3. Evaluation of the constants occurs in BB3 and BB5.

And Evaluation here means use of the constant (vs. definition ?).

In this case, use of the constant.



PRE/GCSE is better suited for this scenario, but it has a critical constraint.  In particular our PRE formulation is never allowed to put an evaluation of an expression on a path that didn't have one before. So while there clearly a redundancy on the path 2->3->4->5 (BB3 and BB5), there is nowhere we could put an evaluation that would reduce the number of evaluation on that path without introducing an evaluation on paths that didn't have one.  So consider 2->4->6.  On that path there are zero evaluations.  So we can't place an eval in BB2 because that will cause evaluations on 2->4->6 which didn't have any evaluations.

OK. How does PRE calculate all possible paths to consider: say your example 2-3-4-5 and 2-4-6 ? Is that just indicative or would actually be the one PRE calculates for this case. Would there be more ?

PRE has a series of dataflow equations it solves which gives it the answer to that question.  The one that computes this property is usually called anticipated.  Given some block B in a graph G. An expression is anticipated at B when the expression is guaranteed to be computed if we reach B.  That doesn't mean the evaluation must happen in B, just that evaluation at some point is guaranteed if we reach B.

If an expression is not anticipated in a block, then PRE will not insert in that block since doing so would add evaluations on paths where they did not previously have any.



There isn't a great place in GCC to handle this right now.  If the constraints were relaxed in PRE, then we'd have a chance, but getting the cost model right is going to be tough.

It would have been better (for this specific case) if loop unrolling was not being done so early. The tree pass cunroll is flattening it out and leaving for rest of the all tree/rtl passes to pick up the pieces and remove any redundancies, if at all. It obviously needs to be early if we are injecting 7x more instructions, but seems like a lot to unravel.

Yup.  If that loop gets unrolled, it's going to be a mess.  It will almost certainly make this problem worse as each iteration is going to have a pair of constants loaded and no good way to remove them.



FWIW -fno-unroll-loops only seems to work at -O2. At -O3 it always unrolls. Is that expected ?

The only case I'm immediately aware of where this wouldn't work would be if -O3 came after -fno-unroll-oops.



If this seems worthwhile and you have ideas to do this any better, I'd be happy to work on this with some guidance.

I don't see  a great solution here.    Something like Cliff Click's work might help, but it's far from a guarantee.  Click's work essentially throws away the PRE constraint about never inserting an expression evaluation on a path where it didn't exit, along with all kinds of other things.  Essentially it's a total reformulation of redundancy elimination.


I did an implementation eons ago in gimple, but never was able to convince myself the implementation was correct or that integrating it was a good thing.   It's almost certainly going to cause performance regressions elsewhere so it may end up doing more harm than good.  I don't really know.


https://courses.cs.washington.edu/courses/cse501/06wi/reading/click-pldi95.pdf


Jeff



Reply via email to