Re: [PATCH] Improve stack variable reuse with inlining with exceptions (PR tree-optimization/86214)

Jakub Jelinek Thu, 17 Jan 2019 06:44:20 -0800

On Thu, Jan 17, 2019 at 03:06:57PM +0100, Richard Biener wrote:
> > +   tree new_var = *t;
> > +   if (VAR_P (new_var)
> > +       && !DECL_HARD_REGISTER (new_var)
> > +       && !TREE_THIS_VOLATILE (new_var)
> > +       && !DECL_HAS_VALUE_EXPR_P (new_var)
> > +       && !is_gimple_reg (new_var)
> > +       && auto_var_in_fn_p (new_var, id->dst_fn))
> > +     {
> > +       gimple_stmt_iterator gsi = gsi_after_labels (bb);
> > +       tree clobber = build_clobber (TREE_TYPE (new_var));
> > +       gimple *clobber_stmt = gimple_build_assign (new_var, clobber);
> > +       gsi_insert_before (&gsi, clobber_stmt, GSI_NEW_STMT);
> > +     }
> > +      }
> 
> So we do not care to optimize this to only clobber the vars that
> are appear live over the EH edge?


Wouldn't that be quite expensive (especially at that spot in the inliner)?
I could surely defer that (at the spot in copy_edges_for_bb just remember
which bb was it and handle it before destroying the decl_map in
expand_call_inline, but:
1) are the extra CLOBBERs that big a deal?
2) if they are, shouldn't we have a pass that does it generically after IPA
   and removes all CLOBBERs for vars already known dead, whether they come
   from inlining or whatever else (we have many in the IL already that are
   dead for other reasons, e.g. a variable where the destructor is inlined
   and has one CLOBBER in itself, but have another CLOBBER in the caller too
   when the var goes out of scope); tree DSE is able to remove CLOBBERs for
   the same var if they are adjacent; in either case, what would be the
   algorithm for that?  Something like add_scope_conflicts algorithm, just
   not build any conflicts, but just propagate the var is live bitmaps
   and when the propagation terminates, go through all bbs and if it sees
   a clobber on a var that isn't live at that point, remove the clobber?

> > @@ -2317,8 +2351,16 @@ copy_edges_for_bb (basic_block bb, profi
> >           e->probability = old_edge->probability;
> >         
> >            FOR_EACH_EDGE (e, ei, copy_stmt_bb->succs)
> > -       if ((e->flags & EDGE_EH) && !e->probability.initialized_p ())
> > -         e->probability = profile_probability::never ();
> > +       if (e->flags & EDGE_EH)
> > +         {
> > +           if (!e->probability.initialized_p ())
> > +             e->probability = profile_probability::never ();
> > +           if (e->dest->index < id->add_clobbers_to_eh_landing_pads)
> > +             {
> 
> Why don't we need to watch out for EDGE_COUNT (e->dest->preds) != 1?

That is the usual case.

> Is it that we just don't care?

That was the intention, if say the EH pad has 4 predecessors,
3 of them from 3 different inline functions and the last one from something
else, then the alternatives to what the patch does (i.e. have clobbers
for variables from all those 3 inline functions, the EH pad is initially
in the caller, so the vars from those inline functions can't be magically
still live there) would be either do nothing and keep this PR open and
have inlining increase stack sizes much more than it expects to in its
estimations, or split those EH pads, have special EH pad for one inline
function and have that set of clobber there, another one for another etc.
and branch to the same spot.  But that means worse code generation unless we
optimize that away (and, either we optimize that by throwing the CLOBBERs
out and we are back to this PR, or we optimize them by moving all the
CLOBBERs into the same EH pad (essentially sinking the CLOBBERs to the
common fallthru, but that means what this patch does)).

Note, analysis of what variables are dead could be useful e.g. for
tail-merge pass, e.g. on:
int bar (int n);
void baz (char *x, int n);

int
foo (int x, int y)
{
  switch (x)
    {
    case 1:
      {
        char buf[8192];
        baz (buf, 4);
        if (y != 5)
          return 4;
        bar (2);
        bar (3);
        bar (4);
        bar (5);
        bar (6);
        bar (7);
        bar (9);
        break;
      }
    case 2:
      {
        char buf[8192];
        baz (buf, 124);
        if (y != 7)
          return 19;
        bar (2);
        break;
      }
    case 3:
      {
        char buf[8192];
        baz (buf, 51);
        if (y != 19)
          return 26;
        bar (2);
        bar (3);
        bar (4);
        bar (5);
        bar (6);
        bar (7);
        bar (9);
        break;
      }
    default:
      {
        char buf[8192];
        baz (buf, 18);
        if (y != 21)
          return 28;
        bar (2);
        bar (3);
        bar (4);
        bar (5);
        bar (6);
        bar (7);
        bar (9);
        break;
      }
    }
  bar (8);
  return 7;
}

we don't optimize this at -O2, but do with -O2 -fstack-reuse=none when
we don't have the CLOBBERs around.  If tail-merge pass could run this
analysis and find out that while the basic blocks under consideration have
different sets of clobbers, the vars from clobbers from the other sets are dead
in the bb not having those, thus it should be possible to union the clobber
sets and tail-merge because of that.

        Jakub

Re: [PATCH] Improve stack variable reuse with inlining with exceptions (PR tree-optimization/86214)

Reply via email to