Re: [PATCH] tree-optimization/116166 - forward jump-threading going wild

Andrew MacLeod Fri, 09 Aug 2024 12:25:55 -0700


On 8/7/24 03:13, Richard Biener wrote:

On Tue, 6 Aug 2024, Andrew MacLeod wrote:

On 8/6/24 09:12, Richard Biener wrote:

Currently the forward threader isn't limited as to the search space
it explores and with it now using path-ranger for simplifying
conditions it runs into it became pretty slow for degenerate cases
like compiling insn-emit.cc for RISC-V esp. when compiling for
a host with LOGICAL_OP_NON_SHORT_CIRCUIT disabled.

The following makes the forward threader honor the search space
limit I introduced for the backward threader.  This reduces
compile-time from minutes to seconds for the testcase in PR116166.

Note this wasn't necessary before we had ranger but with ranger
the work we do is quadatic in the length of the threading path
we build up (the same is true for the backwards threader).

Theres probably some work that can be done in the path processing space using
the new gori_on_edge interface I introduced for the fast VRP pass.

// These APIs are used to query GORI if there are ranges generated on an edge.
// GORI_ON_EDGE is used to get all the ranges at once (returned in an
// ssa_cache structure).
// Fill ssa-cache R with any outgoing ranges on edge E, using QUERY.
bool gori_on_edge (class ssa_cache &r, edge e, range_query *query = NULL);

With this, the threader and path calculator can get and collect all the
outgoing ranges for a block in linear time and just keep them.. and decide
what it wants to use.   I suspect for really large CFGs, we'd want to
substitute and alternative ssa_cache implementation to something like the
sbr_sparse_bitmap class ranger's  cache uses which compresses the size of the
vector so it isn't a vector over all the ssa-names, and at the same time limit
it to  a max of 14 outgoing ranges.

no one has had any time to investigate that  yet.

The thing that path-ranger handles in addition to what range_on_edge
gives you is CFG merges where knowing the path into it improves
ranges downstream.  Basically path-ranger improves the range of PHIs
based on known path taken.  Whenever the decision which edge(s) of
a PHI are taken changes we'd have to wipe cached information.

In PR114855 the gori compute itself shows scalability issues
(I've identified the dominator walk sofar but it might not be
the full story).

The real story is that we do not cache edge calculations, thus thescalability issues with path-ranger. Ranger caches on-entry valueswhich integrates these edge values, and thus avoids recalculating edges,so there was never a need to cache edges.

After the 2 patches I just committed, the remainder of time at -O1 inPR114855 is spend by the path ranger recalculating ranges on edges overand over. Something like previously mentioned the gori_on_edge ()approach might allow the path ranger to calculate these static edgevalues only once per edge, and cache them somewhere. There would be onessa_cache class structure per edge you cared about, and it wouldinstantly tell you the static range of any ssa-name on the edge... ifhas_edge_range_p() is true.

Thinking about it the last couple of days. I think we can do better. Ithink I can integrate that into GORI proper so that edges can optionallybe cached as well, and this should alleviate at least some of thecompile time pressure caused by the lack of caching. It'd be a bit ofan experiment, but it dovetails nicely with some range-logging work I'minvestigating.

I'm on vacation next week, so It'll be at least a week before I can getlook at it closer. I think Aldy is back then too, so maybe we can figuresomething out for the remaining compile time issues in DOM.


Andrew

Re: [PATCH] tree-optimization/116166 - forward jump-threading going wild

Reply via email to