> 
> Yeah, the fast summary array lookup itself seems fine.  What slowed
> this down for me was instead:
> 
>   /* A single function body may be represented by multiple symbols with
>      different visibility.  For example, if FUNC is an interposable alias,
>      we don't want to return anything, even if we have summary for the target
>      function.  */
>   enum availability avail;
>   func = func->function_or_virtual_thunk_symbol
>                (&avail, current_function_decl ?
>                         cgraph_node::get (current_function_decl) : NULL);
>   if (avail <= AVAIL_INTERPOSABLE)
>     return NULL;
> 
> which goes through get_availability (via ultimate_alias_target).
> The most common path for the optabs.ii testcase is:
> 
>   /* Inline functions are safe to be analyzed even if their symbol can
>      be overwritten at runtime.  It is not meaningful to enforce any sane
>      behavior on replacing inline function by different body.  */
>   else if (DECL_DECLARED_INLINE_P (decl))
>     avail = AVAIL_AVAILABLE;
> 
> But to get there we had to go through:
> 
>   else if (ifunc_resolver
>          || lookup_attribute ("noipa", DECL_ATTRIBUTES (decl)))
>     avail = AVAIL_INTERPOSABLE;

Hmm, that is kind of hack (I do not remember doing this).
The visibility checks are frequent in pretty much all IPA propagation
code, so I was always meaning to keep it quite cheap.
Pehraps simply adding noipa flag into cgraph_node itself?

I was also thinking about simply storing visibility to cgraph node. The
catch is that it is not completely invaraint (i.e. if you are seeing
access from a comdat group to other symbol in same comdat group it is
not interposable, while code outside of the comdat group may see it is
as interposable).  I am not even sure how important is to handle that
case except that I did, when I originally implemented the logic.
> 
> Last night I tried caching the noipa attribute, like we do for ifunc.
> That did improve compile time by 0.5% (relative to with the patch
> applied rather than without it).  It also made the cost of the lookup
> part, up to this line:
> 
> >     2988                 :  240234208 :           if (summary)
> 
> negligible even with the original order of the checks.  So I think
> with that change, the lookup becomes cheap, like you say.

Thanks for noticing the attribut lookup.  I guess I should be more
careful about having these in hot parts of IPA code (such as in
inliner).  I will do some coveraging to see how bad it is in practice.
> 
> I don't think adding the caching is safe at this point in stage 3
> though.  Attributes are added by direct manipulation of the tree data
> structure and there are multiple places that in principle could
> (directly or indirectly) add new noipa attributes after the decl
> has been entered into the symbol table.  No existing tests seemed
> to expose that, but it was possible to write a new test that did.

Indeed one needs to be bit careful and not set it while frontend is
still on its (somewhat messy job) of changing trees.  However we only
need it for functions with are definitions and analysyed and we could
easily set it as part of the analyse_function path.  I will cook up a
patch.

Also I disucssed the PTA issues with Richi and I think we are in
agreement that we want to simply fixup the ??? comment and make the PTA
right.  It is also on my todo list.
> 
> >     2989                 :            :             {
> >     2990                 :   19732392 :               if 
> > (!modref_may_conflict (call, summary->stores, ref, tbaa_p)
> >
> > And we get here in 8% of invocations
> 
> Once the lookup is neglible as above, so that the cost is purely on this
> test, I still see a ~2% slowdown relative to the “PT first” ordering.
> I can look into why in more detail, if that's unexpected,  It seemed
> from a quick look to be inner ao_ref stuff.
> 
> Like you said at the top of the quote above, the number of loop
> iterations per query was low.  By far the most common in the optabs.ii
> testcase was:
> 
>   1 outer
>   2 middle
>   2 inner

Yep.  Most common case where we end up with bigger load/store trees are
consturctors that store various fields of the structure.  So reversing
PTA and modref query still makes sense to me.

Honza
> 
> Thanks,
> Richard

Reply via email to