On 8/6/24 09:12, Richard Biener wrote:
Currently the forward threader isn't limited as to the search space
it explores and with it now using path-ranger for simplifying
conditions it runs into it became pretty slow for degenerate cases
like compiling insn-emit.cc for RISC-V esp. when compiling for
a host with LOGICAL_OP_NON_SHORT_CIRCUIT disabled.
The following makes the forward threader honor the search space
limit I introduced for the backward threader. This reduces
compile-time from minutes to seconds for the testcase in PR116166.
Note this wasn't necessary before we had ranger but with ranger
the work we do is quadatic in the length of the threading path
we build up (the same is true for the backwards threader).
Theres probably some work that can be done in the path processing space
using the new gori_on_edge interface I introduced for the fast VRP pass.
// These APIs are used to query GORI if there are ranges generated on an
edge.
// GORI_ON_EDGE is used to get all the ranges at once (returned in an
// ssa_cache structure).
// Fill ssa-cache R with any outgoing ranges on edge E, using QUERY.
bool gori_on_edge (class ssa_cache &r, edge e, range_query *query = NULL);
With this, the threader and path calculator can get and collect all the
outgoing ranges for a block in linear time and just keep them.. and
decide what it wants to use. I suspect for really large CFGs, we'd
want to substitute and alternative ssa_cache implementation to something
like the sbr_sparse_bitmap class ranger's cache uses which compresses
the size of the vector so it isn't a vector over all the ssa-names, and
at the same time limit it to a max of 14 outgoing ranges.
no one has had any time to investigate that yet.
Bootstrap and regtest running on x86_64-unknown-linux-gnu.
OK if that succeeds?
OK with me.
Andrew