On 9/27/21 5:01 PM, Jeff Law wrote:


On 9/24/2021 9:46 AM, Aldy Hernandez wrote:
This patch implements the new hybrid forward threader and replaces the
embedded VRP threader with it.
But most importantly, it pulls it out of the VRP pass as we no longer need the VRP data or ASSERT_EXPRs.

Yes, I have a follow-up patch removing the old mini-pass.



With all the pieces that have gone in, the implementation of the hybrid
threader is straightforward: convert the current state into
SSA imports that the solver will understand, and let the path solver
precompute ranges and relations for the path.  After this setup is done,
we can use the range_query API to solve gimple statements in the threader.
The forward threader is now engine agnostic so there are no changes to
the threader per se.
So the big question is do we think it's going to be this clean when we try to divorce the threading from DOM?

Interestingly, yes. With all the refactoring I've done, it turns out that divorcing evrp from the DOM threader is a matter of having dom_jt_simplifier inherit from hybrid_jt_simplifier instead of the base class. Then we have simplify() look at the const_copies/avails, otherwise let the hybrid simplifier do its thing. Yes, I was amazed too.

As usual there are caveats:

First, notice that we'd still depend on const_copies/avails, because we'd need them for floats anyhow. But this has the added benefit of catching a few things in the presence of the IL changing from under us.

Second, it turns out that DOM has other uses of evrp that need to be addressed-- particularly its use of evrp to do its simple copy prop.

Be that as it may, none of these are show stoppers. I have a proof of concept that converts everything with a few lines of code.

The big issue now is performance. Plugging in the full ranger makes it uncomfortably slower than just using evrp. Andrew has some ideas for a super fast ranger that doesn't do full look-ups, so we have finally found a good use case for something we had in the back burner.

Now, numbers...

Converting the DOM threader to a hybrid client improves DOM threading counts by 4%, but it's all at the expense of other passes. The total threading counts was unchanged (well, it got worse by -0.05%). It doesn't look like there's any gain. We're shuffling things around at this point.



I have put the hybrid bits in tree-ssa-threadedge.*, instead of VRP,
because they will also be used in the evrp removal of the DOM/threader,
which is my next task.
Sweet.


Most of the patch, is actually test changes.  I have gone through every
single one and verified that we're correct.  Most were trivial dump
file name changes, but others required going through the IL an
certifying that the different IL was expected.

For example, in pr59597.c, we have one less thread because the
ASSERT_EXPR was getting in the way, and making it seem like things were
not crossing loops.  The hybrid threader sees the correct representation
of the IL, and avoids threading this one case.

The final numbers are a 12.16% improvement in jump threads immediately
after VRP, and a 0.82% improvement in overall jump threads.  The
performance drop is 0.6% (plus the 1.43% hit from moving the embedded
threader into its own pass).  As I've said, I'd prefer to keep the
threader in its own pass, but if this is an issue, we can address this
with a shared ranger when VRP is replaced with an evrp instance
(upcoming).
Presumably we're also seeing a cannibalization of threads from later passes.   And just to be clear, this is good.

And the big question, is the pass running after VRP2 doing anything particularly useful?  Do we want to try and kill it now, or later?

Interesting question. Perhaps if we convert DOM threading to a hybrid model, it will render the post-VRP threader completely useless. Huhh... That could kill 2 birds with one stone... we get rid of a threading pass, and we don't need to worry about as much about the super-fast ranger.

Huh...good idea.  I will experiment.

Thanks.
Aldy

Reply via email to