On Wed, 10 Aug 2022, Richard Biener wrote:

> On Tue, 9 Aug 2022, Andrew MacLeod wrote:
> 
> > 
> > On 8/9/22 09:01, Richard Biener wrote:
> > > This revisits how we compute imports later used for the ranger path
> > > query during backwards threading.  The compute_imports function
> > > of the path solver ends up pulling the SSA def chain of regular
> > > stmts without limit and since it starts with just the gori imports
> > > of the path exit it misses some interesting names to translate
> > > during path discovery.  In fact with a still empty path this
> > > compute_imports function looks like not the correct tool.
> > 
> > I don't really know how this works in practice.  Aldys off this week, so he
> > can comment when he returns.
> > 
> > The original premise was along the line of recognizing that only changes to 
> > a
> > GORI import name to a block can affect the branch at the end of the block. 
> > ie, if the path doesn't change any import to block A, then the branch at the
> > end of block A will not change either.    Likewise, if it does change an
> > import, then we look at whether the branch can be threaded.    Beyond that
> > basic premise, I dont know what all it does.
> 
> Yep, I also think that's the idea.

[...]

> Anyway, the important result of the change is that the imports
> set is vastly smaller since it is now constrained to the
> actual path.  Plus it now contains the local defs in blocks
> of the path that do not dominate the exit block which means
> it might get more threading opportunities.  Which reminds me
> to re-do the cc1files experiment for this change.

Results are a bit unconclusive.  We have
ethread 14044 -> 14058, first threadfull 36986 -> 37174,
first thread 8060 -> 8049, dom 21723 -> 21822, second thread
(after loop opts) 6242 -> 5582, second dom 8522 -> 8765,
second threadfull 3072 -> 2998 which makes an overall drop
in the number of threads from 98649 to 98448.

I've isolated one testcase we threaded before but no longer after
this change and that is

void foo (int nest, int print_nest)
{
  _Bool t0 = nest != 0;
  _Bool t1 = nest == print_nest;
  _Bool t2 = t0 & t1;
  if (t2)
    __builtin_puts ("x");
  nest++;
  if (nest > 2)
    __builtin_abort ();
  if (print_nest == nest)
    __builtin_puts ("y");
}

where we are able to thread from if (t2) to if (print_nest == nest)
resolving that to false when t2 is true using the nest == print_nest
relation.  Now, the reason is because of the imports added by

  // Exported booleans along the path, may help conditionals.
  if (m_resolve)
    for (i = 0; i < m_path.length (); ++i)
      {   
        basic_block bb = m_path[i];
        tree name;
        FOR_EACH_GORI_EXPORT_NAME (gori, bb, name)
          if (TREE_CODE (TREE_TYPE (name)) == BOOLEAN_TYPE)
            bitmap_set_bit (imports, SSA_NAME_VERSION (name));
      }

_BUT_ - it turns out that the m_path local to the solver is still
the one from the last back_threader::find_taken_edge_switch
calling m_solver->compute_ranges (path, m_imports), so for a possibly
completely unrelated path.  Truncating the path also on the solver
side before calling compute_imports makes the testcase fail to thread.
I'll note the PHI handling also looks at the stale m_path.

I see the solver itself adds relations from edges on the path so
the cruical item here seems to be to add imports for the path
entry conditional, but those would likely be GORI imports for that
block?  Unfortunately that fails to add t[012], the GORI exports
seem to cover all that's needed but then exports might be too much
here?  At least that's what the code in compute_imports effectively
does (also for the entry block).  But I do wonder why compute_ranges
does not add relations computed by the entry conditional ...
That's probably because of

void
path_range_query::compute_outgoing_relations (basic_block bb, basic_block 
next)
{
  gimple *stmt = last_stmt (bb);

  if (stmt
      && gimple_code (stmt) == GIMPLE_COND
      && (import_p (gimple_cond_lhs (stmt))
          || import_p (gimple_cond_rhs (stmt))))

where for a condition like above the names in the imports are not
in the condition itself.  The gori.outgoing_edge_range_p compute
of the import ranges is appearantly not affected by this
restriction and picks up nest as ~[0, 0] on the respective path
even though, while 'nest' is in imports, the temporaries are not.
That would support the argument to drop the import_p checks in
path_range_query::compute_outgoing_relations.

I'm not sure it's worth fixing incrementally though, it's fallout
of r12-5157-gbfa04d0ec958eb that in some way did the reverse of
my patch.  Interestingly hybrid_jt_simplifier::compute_ranges_from_state
still computes imports itself before calling compute_ranges.

For comparing thread numbers fixing the current state incrementally
would be nice I guess, I will test something but not further pursue it
if not successful.

Richard.

Reply via email to