> > 
> > Are you sure?  I thought we compute that up-front ... at least
> > we make sure we can_inline_edge_p all calls before even trying
> > to inline all calls - that looks somewhat redundant then if it
> > can happen that we give up anyway.  Ah, so can_inline_edge_p has
> > 
> >   /* Check if caller growth allows the inlining.  */
> >   else if (!DECL_DISREGARD_INLINE_LIMITS (callee->decl)
> >            && !disregard_limits
> >            && !lookup_attribute ("flatten",
> >                                  DECL_ATTRIBUTES (caller->decl))
> >            && !caller_growth_limits (e))
> >     inlinable = false;
> > 
> > and we check that from when we do the inlining and upfront from

> > want_inline_function_to_all_callers_p ... we also give up
> > if we inline a recursive function it seems.
Yep, there is early check for recursion and sizes. But you can still hit the
size limits later. If a function A calls 1000 times function B, each of the
calls individually may be inlinable, but not all together.
> > 
> > > For next stage1 I plan to finish the overahaul to sreals that will let
> > > me to make update_overall_summary incremental (i.e. 
> > > O(size_of_inlined_function)
> > > and not O(size_of_inlined_function+size_of_function_inlined_to)) that will
> > > also solve these issues.
> > 
> > Hopefully.
> > 
> > > One can probably do a pesimistic estimate on code size of the function 
> > > (i.e. add the size of inlined function) and only if that hits the large 
> > > function growth do the exact calculation. Or we can just decide that it 
> > > is OK to ignore large function growht in this partiuclar case.
> > 
> > Ignoring it is probably not a good idea and will just lead to a different
> > degenerate case.  As we update the summary afterwards it's probably
> > ok to do incremental (pessimistic) updates on the size anyway?  In
> > inline_merge_summary I mean?  Or should I open-code that into
> > inline_call if ! update_overall_summary?
> 
> So like below - I update self-size by the growth estimate which is
> supposed to match, this should keep the overall function growth
> limits working, not sure about the stack stuff but that doesn't seem
> to be updated by update_overall_summaries either.

Not really, the sizes are also computed in fixed point math, because we 
estimate probabilities
when insn is going to be removed. There are some cases wher estimate_growth is 
not as precise
as the real thing, you can see the #if 0 out asstert in ipa-inline-transform 
abou tthat.
Well, for inline functions called once, I suppose precision of these details is 
not as important,
so we may just go with the patch. 
I will finish the sreal conversion next stage1 (I got stuck on this patch 
series on gengtype change
to allow sreal inline_summary which I do not think ever got reviewed) and do 
the incremental update.
So if you fell this PR is important, go ahead with the patch.

Honza
> 
> This also fixes a similar issue in early inlining where we can then
> trivially delay update_overall_summaries.  I remember seeing early
> inline analysis behave similarly quadratic.
> 
> It leaves alone recursive and IPA small function inlining as those
> update overall unit size as well - I first thought about somehow
> making overall summary update lazy, but that looks quite complicated
> and IIRC the round-off errors issue was when re-computing the
> key for the fibheap after doing one inlining, right?
> 
> I hope you can get to use sreals early in next stage1 though I wonder
> how that will avoid all corner-cases.
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.  Ok for trunk
> and branch?
> 
> Thanks,
> Richard.
> 
> 2016-02-18  Richard Biener  <rguent...@suse.de>
> 
>       PR ipa/37448
>       * ipa-inline-transform.c (inline_call): When not updating
>       overall summaries adjust self size by the growth estimate.
>       * ipa-inline.c (inline_to_all_callers_1): Add to the callers
>       hash-set, do not update overall summaries here.  Renamed from ...
>       (inline_to_all_callers): ... this which is now wrapping the
>       above and performing delayed overall summary update.
>       (early_inline_small_functions): Delay updating of the overall
>       summary.
> 
> Index: gcc/ipa-inline-transform.c
> ===================================================================
> --- gcc/ipa-inline-transform.c        (revision 233498)
> +++ gcc/ipa-inline-transform.c        (working copy)
> @@ -300,10 +300,12 @@ inline_call (struct cgraph_edge *e, bool
>    struct cgraph_edge *curr = e;
>    struct cgraph_node *callee = e->callee->ultimate_alias_target ();
>    bool new_edges_found = false;
> +  int estimated_growth;
>  
> +  if (! update_overall_summary)
> +    estimated_growth = estimate_edge_growth (e);
>    /* This is used only for assert bellow.  */
>  #if 0
> -  int estimated_growth = estimate_edge_growth (e);
>    bool predicated = inline_edge_summary (e)->predicate != NULL;
>  #endif
>  
> @@ -373,7 +375,13 @@ inline_call (struct cgraph_edge *e, bool
>      new_edges_found = ipa_propagate_indirect_call_infos (curr, new_edges);
>    check_speculations (e->callee);
>    if (update_overall_summary)
> -   inline_update_overall_summary (to);
> +    inline_update_overall_summary (to);
> +  else
> +    /* Update self size by the estimate so overall function growth limits
> +       work for further inlining into this function.  Before inlining
> +       the function we inlined to again we expect the caller to update
> +       the overall summary.  */
> +    inline_summaries->get (to)->size += estimated_growth;
>    new_size = inline_summaries->get (to)->size;
>  
>    if (callee->calls_comdat_local)
> Index: gcc/ipa-inline.c
> ===================================================================
> --- gcc/ipa-inline.c  (revision 233498)
> +++ gcc/ipa-inline.c  (working copy)
> @@ -2163,7 +2163,8 @@ flatten_function (struct cgraph_node *no
>     recursion.  */
>  
>  static bool
> -inline_to_all_callers (struct cgraph_node *node, void *data)
> +inline_to_all_callers_1 (struct cgraph_node *node, void *data,
> +                      hash_set<cgraph_node *> *callers)
>  {
>    int *num_calls = (int *)data;
>    bool callee_removed = false;
> @@ -2193,7 +2194,10 @@ inline_to_all_callers (struct cgraph_nod
>                  inline_summaries->get (node->callers->caller)->size);
>       }
>  
> -      inline_call (node->callers, true, NULL, NULL, true, &callee_removed);
> +      /* Remember which callers we inlined to, delaying updating the
> +      overall summary.  */
> +      callers->add (node->callers->caller);
> +      inline_call (node->callers, true, NULL, NULL, false, &callee_removed);
>        if (dump_file)
>       fprintf (dump_file,
>                " Inlined into %s which now has %i size\n",
> @@ -2211,6 +2215,23 @@ inline_to_all_callers (struct cgraph_nod
>    return false;
>  }
>  
> +/* Wrapper around inline_to_all_callers_1 doing delayed overall summary
> +   update.  */
> +
> +static bool
> +inline_to_all_callers (struct cgraph_node *node, void *data)
> +{
> +  hash_set<cgraph_node *> callers;
> +  bool res = inline_to_all_callers_1 (node, data, &callers);
> +  /* Perform the delayed update of the overall summary of all callers
> +     processed.  This avoids quadratic behavior in the cases where
> +     we have a lot of calls to the same function.  */
> +  for (hash_set<cgraph_node *>::iterator i = callers.begin ();
> +       i != callers.end (); ++i)
> +    inline_update_overall_summary (*i);
> +  return res;
> +}
> +
>  /* Output overall time estimate.  */
>  static void
>  dump_overall_stats (void)
> @@ -2590,10 +2611,13 @@ early_inline_small_functions (struct cgr
>       fprintf (dump_file, " Inlining %s into %s.\n",
>                xstrdup_for_dump (callee->name ()),
>                xstrdup_for_dump (e->caller->name ()));
> -      inline_call (e, true, NULL, NULL, true);
> +      inline_call (e, true, NULL, NULL, false);
>        inlined = true;
>      }
>  
> +  if (inlined)
> +    inline_update_overall_summary (node);
> +
>    return inlined;
>  }
>  

Reply via email to