Re: Redundant constants in coremark crc8 for RISCV/aarch64 (no-if-conversion)

Jeff Law via Gcc Tue, 18 Oct 2022 16:36:55 -0700


On 10/18/22 15:51, Vineet Gupta wrote:

Where BB4 corresponds to .L2 and BB6 corresponds to .L3. Evaluationof the constants occurs in BB3 and BB5.
And Evaluation here means use of the constant (vs. definition ?).


In this case, use of the constant.

PRE/GCSE is better suited for this scenario, but it has a criticalconstraint. In particular our PRE formulation is never allowed toput an evaluation of an expression on a path that didn't have onebefore. Sowhile there clearly a redundancy on the path 2->3->4->5 (BB3 andBB5), there is nowhere we could put an evaluation that would reducethe number of evaluation on that path without introducing anevaluation on paths that didn't have one. So consider 2->4->6. Onthat path there are zero evaluations. So we can't place an eval inBB2 because that will cause evaluations on 2->4->6 which didn't haveany evaluations.
OK. How does PRE calculate all possible paths to consider: say yourexample 2-3-4-5 and 2-4-6 ? Is that just indicative or would actuallybe the one PRE calculates for this case. Would there be more ?

PRE has a series of dataflow equations it solves which gives it theanswer to that question. The one that computes this property is usuallycalled anticipated. Given some block B in a graph G. An expression isanticipated at B when the expression is guaranteed to be computed if wereach B. That doesn't mean the evaluation must happen in B, just thatevaluation at some point is guaranteed if we reach B.

If an expression is not anticipated in a block, then PRE will not insertin that block since doing so would add evaluations on paths where theydid not previously have any.

There isn't a great place in GCC to handle this right now. If theconstraints were relaxed in PRE, then we'd have a chance, but gettingthe cost model right is going to be tough.
It would have been better (for this specific case) if loop unrollingwas not being done so early. The tree pass cunroll is flattening itout and leaving for rest of the all tree/rtl passes to pick up thepieces and remove any redundancies, if at all. It obviously needs tobe early if we are injecting 7x more instructions, but seems like alot to unravel.

Yup. If that loop gets unrolled, it's going to be a mess. It willalmost certainly make this problem worse as each iteration is going tohave a pair of constants loaded and no good way to remove them.

FWIW -fno-unroll-loops only seems to work at -O2. At -O3 it alwaysunrolls. Is that expected ?

The only case I'm immediately aware of where this wouldn't work would beif -O3 came after -fno-unroll-oops.

If this seems worthwhile and you have ideas to do this any better, I'dbe happy to work on this with some guidance.

I don't see a great solution here. Something like Cliff Click's workmight help, but it's far from a guarantee. Click's work essentiallythrows away the PRE constraint about never inserting an expressionevaluation on a path where it didn't exit, along with all kinds of otherthings. Essentially it's a total reformulation of redundancy elimination.

I did an implementation eons ago in gimple, but never was able toconvince myself the implementation was correct or that integrating itwas a good thing. It's almost certainly going to cause performanceregressions elsewhere so it may end up doing more harm than good. Idon't really know.



https://courses.cs.washington.edu/courses/cse501/06wi/reading/click-pldi95.pdf


Jeff

Re: Redundant constants in coremark crc8 for RISCV/aarch64 (no-if-conversion)

Reply via email to