heh. or just

+      int_range<2> r;
+      if (!fold_range (r, const_cast <gcond *> (cond_stmt))
+      || !r.singleton_p (&val))


if you do not provide a range_query to any of the fold_using_range code, it defaults to:

fur_source::fur_source (range_query *q)
{
  if (q)
    m_query = q;
  else if (cfun)
    m_query = get_range_query (cfun);
  else
    m_query = get_global_range_query ();
  m_gori = NULL;
}

so it will default to the one you provide, and if there isn't one, it will try to use the cfun version if cfun is available.. otherwise it defaults to the global range query.  So you dont need to provide the cfun version.

This applies to the 5 basic routines in gimple-fold-range.h:

// Fold stmt S into range R using range query Q.
bool fold_range (vrange &r, gimple *s, range_query *q = NULL);
// Recalculate stmt S into R using range query Q as if it were on edge ON_EDGE.
bool fold_range (vrange &v, gimple *s, edge on_edge, range_query *q = NULL);

// These routines the operands to be specified when manually folding.
// Any excess queries will be drawn from the current range_query.
bool fold_range (vrange &r, gimple *s, vrange &r1);
bool fold_range (vrange &r, gimple *s, vrange &r1, vrange &r2);
bool fold_range (vrange &r, gimple *s, unsigned num_elements, vrange **vector);

Andrew


On 8/15/22 15:09, Aldy Hernandez wrote:
On Mon, Aug 15, 2022 at 11:39 AM Richard Biener <rguent...@suse.de> wrote:
On Fri, 12 Aug 2022, Aldy Hernandez wrote:

On Fri, Aug 12, 2022 at 2:01 PM Richard Biener <rguent...@suse.de> wrote:
This started with noticing we add ENTRY_BLOCK to our threads
just for the sake of simplifying the conditional at the end of
the first block in a function.  That's not really threading
anything but it ends up duplicating the entry block, and
re-writing the result instead of statically fold the jump.
Hmmm, but threading 2 blocks is not really threading at all??  Unless
I'm misunderstanding something, this was even documented in the
backwards threader:

[snip]
      That's not really a jump threading opportunity, but instead is
      simple cprop & simplification.  We could handle it here if we
      wanted by wiring up all the incoming edges.  If we run this
      early in IPA, that might be worth doing.   For now we just
      reject that case.  */
   if (m_path.length () <= 1)
       return false;

Which you undoubtedly ran into because you're specifically eliding the check:

-         if (m_profit.profitable_path_p (m_path, m_name, taken_edge,
-                                         &irreducible)
+         if ((m_path.length () == 1
+              || m_profit.profitable_path_p (m_path, m_name, taken_edge,
+                                             &irreducible))
Correct.  But currently the threader just "cheats", picks up one more
block and then "threads" the case anyway, doing this simple simplification
in the most expensive way possible ...
Ah.

The following tries to handle those by recording simplifications
of the exit conditional as a thread of length one.  That requires
special-casing them in the backward copier since if we do not
have any block to copy but modify the jump in place and remove
not taken edges this confuses the hell out of remaining threads.

So back_jt_path_registry::update_cfg now first marks all
edges we know are never taken and then prunes the threading
candidates when they include such edge.  Then it makes sure
to first perform unreachable edge removal (so we avoid
copying them when other thread paths contain the prevailing
edge) before continuing to apply the remaining threads.
This is all beyond my pay grade.  I'm not very well versed in the
threader per se.  So if y'all think it's a good idea, by all means.
Perhaps Jeff can chime in, or remembers the above comment?

In statistics you can see this avoids quite a bunch of useless
threads (I've investiated 3 random files from cc1files with
dropped stats in any of the thread passes).

Still thinking about it it would be nice to avoid the work of
discovering those candidates we have to throw away later
which could eventually be done by having the backward threader
perform a RPO walk over the CFG, skipping edges that can be
statically determined as not being executed.  Below I'm
abusing the path range query to statically analyze the exit
branch but I assume there's a simpler way of folding this stmt
which could then better integrate with such a walk.
Unreachable paths can be queried with
path_range_query::unreachable_path_p ().  Could you leverage this?
The idea was that if we ever resolved any SSA name to UNDEFINED, the
path itself was unreachable.
The situation is that we end up with paths where an intermediate
branch on the path can be simplified to false - but of course only
if we put all intermediate branch dependences into the list of
imports to consider.

I don't like it very much to use the "threading" code to perform
the simplification but I couldn't figure a cheap way to perform
the simplification without invoking a full EVRP?  That said,
the backwards threader simply does

   basic_block bb;
   FOR_EACH_BB_FN (bb, m_fun)
     if (EDGE_COUNT (bb->succs) > 1)
       maybe_thread_block (bb);

   bool changed = m_registry.thread_through_all_blocks (true);

instead of that we should only consider edges that may be executable
by instead walking the CFG along such edges, simplifying BB exit
conditionals.  Unfortunately EVRP is now a C++ maze so I couldn't
find how to actually do such simplification, not knowing how
interacting with ranger influences the path query use either.
If you or Andrew has any suggestions on how to essentially
do a

   if (edge e = find_taken_edge (bb))
     {
...
     }

where find_taken_edge should be at least as powerful as using
the path query for a single bb then I'd be all ears.  As said,
I tried to find the code to cut&paste in EVRP but failed to ...
Interesting... If what you need is find_taken_edge(bb), I think we can
do this quite cleanly.

What you're asking is to fold the conditional at the end of a basic
block, and return the edge depending on the folded value.  We have all
the smarts to do the folding (fold_range), and tree-cfg.c has
find_taken_edge(basic_block, tree val), which AFAICT, does everything
except the folding of non-trivial statements.

If you like, I could tweak find_taken_edge() to use a ranger if
available (pass has called enable_ranger()), or otherwise use the
global range query in cfun (SSA_NAME_RANGE_INFO but with the
range_query API).  This should be as fast as poking at
SSA_NAME_RANGE_INFO for the common case, or using an active ranger if
available.

See the attached proof of concept.

With this approach we could handle everything find_taken_edge(bb)
currently handles, plus anything we can fold in ranger (without the
penalty of a full ranger if you don't want to-- we can still fold and
use global ranges for SSA names if available).  Or ultimately, if you
have an active ranger, you can leverage the full ranger functionality.
Either way is a lot better than the 1 != 0, etc folding the code
currently does ;-).

If you want to fold these blocks from the threader, we could expose
the ranger in the path_query with enable_ranger(), and then
find_taken_edge(basic_block) can pick the ranger up.  Up to you if you
want to run this within the threader, or elsewhere.

Does this make sense?

I'm happy to cobble it up if you find it useful.
Aldy

Reply via email to