Re: [PATCH] alias: Optimise call_may_clobber_ref_p

Richard Biener via Gcc-patches Mon, 06 Dec 2021 23:53:01 -0800

On Mon, Dec 6, 2021 at 4:45 PM Jan Hubicka <hubi...@kam.mff.cuni.cz> wrote:
>
> > On Mon, Dec 6, 2021 at 4:03 PM Richard Biener
> > <richard.guent...@gmail.com> wrote:
> > >
> > > On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford
> > > <richard.sandif...@arm.com> wrote:
> > > >
> > > > Richard Biener <richard.guent...@gmail.com> writes:
> > > > > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches
> > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > >>
> > > > >> When compiling an optabs.ii at -O2 with a release-checking build,
> > > > >> we spent a lot of time in call_may_clobber_ref_p.  One of the
> > > > >> first things the function tries is the get_modref_function_summary
> > > > >> approach, but that also seems to be the most expensive check.
>
> In general it should not be that expensive since the trees are generally
> small (despite the 3 nested loops making them look dangerously big) and
> most of time we resort to simple pointer/array lookups.
>
> From lcov stats https://splichal.eu/lcov/gcc/tree-ssa-alias.c.gcov.html
> (which are quite useful when trying to understand typical behaviour of
> the alias oracle)
>
> In call_may_clobber_ref_p we get
>
>     2980                 :  256141790 :   callee = gimple_call_fndecl (call);
>     2981                 :            :
>     2982                 :  256141790 :   if (callee != NULL_TREE && 
> !ref->volatile_p)
>     2983                 :            :     {
>     2984                 :  240446150 :       struct cgraph_node *node = 
> cgraph_node::get (callee);
> this is direct pointer from node
>     2985                 :  240446150 :       if (node)
>     2986                 :            :         {
>     2987                 :  240234208 :           modref_summary *summary = 
> get_modref_function_summary (node);
> This is fast summary so it is array lookup
>     2988                 :  240234208 :           if (summary)
>     2989                 :            :             {
>     2990                 :   19732392 :               if 
> (!modref_may_conflict (call, summary->stores, ref, tbaa_p)
>
> And we get here in 8% of invocations
>
> And in modref_may_conflict we have:
>
>
>    21147836 : modref_may_conflict (const gcall *stmt,
>     2552                 :            :                      modref_tree 
> <alias_set_type> *tt, ao_ref *ref, bool tbaa_p)
>     2553                 :            : {
>     2554                 :   21147836 :   alias_set_type base_set, ref_set;
>     2555                 :            :
>     2556                 :   21147836 :   if (tt->every_base)
>     2557                 :            :     return true;
>     2558                 :            :
>     2559                 :    3407876 :   if (!dbg_cnt (ipa_mod_ref))
>     2560                 :            :     return true;
>     2561                 :            :
>     2562                 :    3407876 :   base_set = ao_ref_base_alias_set 
> (ref);
>     2563                 :            :
>     2564                 :    3407876 :   ref_set = ao_ref_alias_set (ref);
> All direct lookups in most cases
>     2565                 :            :
>     2566                 :    3407876 :   int num_tests = 0, max_tests = 
> param_modref_max_tests;
>     2567                 :   14479077 :   for (auto base_node : tt->bases)
>     2568                 :            :     {
>     2569                 :    6171739 :       if (tbaa_p && 
> flag_strict_aliasing)
> At average there are 2 bases travelled by the loop.
>     2570                 :            :         {
>     2571                 :    5253690 :           if (num_tests >= max_tests)
>     2572                 :            :             return true;
>     2573                 :    5253690 :           alias_stats.modref_tests++;
>     2574                 :    5253690 :           if (!alias_sets_conflict_p 
> (base_set, base_node->base))
> alias set checks cheecks are also reasonably cheap and 50% of time avoids 
> walking rest of the tree
>     2575                 :    2238398 :             continue;
>     2576                 :    3015292 :           num_tests++;
>     2577                 :            :         }
>     2578                 :            :
>     2579                 :    3933341 :       if (base_node->every_ref)
>
> We hit the loop only 0.15 times per invocation of the function
>
>     2580                 :            :         return true;
>     2581                 :            :
>     2582                 :   14943624 :       for (auto ref_node : 
> base_node->refs)
>     2583                 :            :         {
>     2584                 :            :           /* Do not repeat same test 
> as before.  */
>     2585                 :    4381325 :           if ((ref_set != base_set || 
> base_node->base != ref_node->ref)
>
> and usually visit only bit more htan one refs
>     2586                 :    2548127 :               && tbaa_p && 
> flag_strict_aliasing)
>     2587                 :            :             {
>     2588                 :    1840866 :               if (num_tests >= 
> max_tests)
>     2589                 :            :                 return true;
>     2590                 :    1809594 :               
> alias_stats.modref_tests++;
>     2591                 :    1809594 :               if 
> (!alias_sets_conflict_p (ref_set, ref_node->ref))
>     2592                 :     314793 :                 continue;
>     2593                 :    1494801 :               num_tests++;
>     2594                 :            :             }
>     2595                 :            :
>     2596                 :    4035260 :           if (ref_node->every_access)
>     2597                 :            :             return true;
>     2598                 :            :
>     2599                 :            :           /* TBAA checks did not 
> disambiguate, try individual accesses.  */
>     2600                 :   13186541 :           for (auto access_node : 
> ref_node->accesses)
>
>     2601                 :            :             {
>     2602                 :    3770816 :               if (num_tests >= 
> max_tests)
>
> Again this loop usually executes only once
>     2603                 :     463332 :                 return true;
>     2604                 :            :
>     2605                 :    3770816 :               tree arg = 
> access_node.get_call_arg (stmt);
>
> here more expensive work starts (we will do ao ref oracle matching
> but only in 0.1 times per invocation.
>
> So in general it should not be too bad as I am also watching number of
> querries produced by this loop in the stats.
>
> However moving PTA tests ahead is probably a good idea (I guess they
> resort to bitmap operaitons that also do some looping).
> > > > 3786              if (stmt_may_clobber_ref_p_1 (def_stmt, ref, tbaa_p))
> > > > (gdb) call debug (def_stmt)
> > > > __builtin_memset (p_3(D), 0, 28);
> > > > (gdb) call debug (ref.base)
> > > > *p_3(D)
> > > > (gdb) call pt_solution_includes (&((gcall *) def_stmt)->call_clobbered, 
> > > > ref.base)
> > > > $3 = false
> > >
> > > huh, well - *p_3(D) isn't a DECL and pt_solution_includes just works for 
> > > decls.
> > > In fact it should eventually ICE even ;)  That said, nothing should
> > > really iniitalize
> > > the call_clobber set besides pt_solution_reset at GIMPLE stmt allocation 
> > > time
> > > or IPA PTA when enabled.  pt_solutuon_reset sets ->anything to true so the
> > > function should return true ...
>
> Even before my patch we computed call clobbered. handle_rhs_call sets
> call clobbers for non-readonly nonescape args via get_call_clobber_vi
> and later in compute_points_to_sets this info is copied into
> gimple_call_clobbered_set:
>
>           pt = gimple_call_clobber_set (stmt);
>           if (gimple_call_flags (stmt) & (ECF_CONST|ECF_PURE|ECF_NOVOPS))
>             memset (pt, 0, sizeof (struct pt_solution));
>           else if ((vi = lookup_call_clobber_vi (stmt)) != NULL)
>             {
>               *pt = find_what_var_points_to (cfun->decl, vi);
>               /* Escaped (and thus nonlocal) variables are always
>                  implicitly clobbered by calls.  */
>               /* ???  ESCAPED can be empty even though NONLOCAL
>                  always escaped.  */
>               pt->nonlocal = 1;
>               pt->escaped = 1;
>             }
>           else
>             {
>               /* If there is nothing special about this call then
>                  we have made everything that is used also escape.  */
>               *pt = cfun->gimple_df->escaped;
>               pt->nonlocal = 1;
>             }
>
> I think you are hitting the following:
>
> static bool
> find_func_aliases_for_builtin_call (struct function *fn, gcall *t)
> {
>   tree fndecl = gimple_call_fndecl (t);
>   auto_vec<ce_s, 2> lhsc;
>   auto_vec<ce_s, 4> rhsc;
>   varinfo_t fi;
>
>   if (gimple_call_builtin_p (t, BUILT_IN_NORMAL))
>     /* ???  All builtins that are handled here need to be handled
>        in the alias-oracle query functions explicitly!  */
>     switch (DECL_FUNCTION_CODE (fndecl))
>       {
>       /* All the following functions return a pointer to the same object
>          as their first argument points to.  The functions do not add
>          to the ESCAPED solution.  The functions make the first argument
>          pointed to memory point to what the second argument pointed to
>          memory points to.  */
>       case BUILT_IN_STRCPY:
>
> Notice the ??? comment. The code does not set clobbers here because it
> assumes that tree-ssa-alias will do the right thing.
> So one may make builtins handling first, PTA next and only if both say
> "may alias" continue.  Other option is to extend the code here to add
> propert clobber/use PTA to make things more regular.  I would be happy
> to help with that.


Or avoid setting the gimple_call_clobber/use sets for builtins handled in
the above way.  As said, the info there is currently incorrect.  Note that
originally the call use/clobber vi were only added in IPA mode.  I suppose
you now add them unconditionally (I forgot about all the changes).

Richard.

>
> Honza
> > >
> > > It looks like Honza added computation of these sets at some point, also 
> > > maybe
> > > forgetting to set some bits here.
> > >
> > > I prefer to investigate / fix this before refactoring the uses in this 
> > > way.  Can
> > > you file a bugreport please?  I can reproduce the memcpy FAIL with
> > > just moving check_fnspec down.
> >
> > So the issue seems that for
> > # .MEM_22 = VDEF <.MEM_19>
> > memcpy (&dst, &src, 1024);
> >
> > determine_global_memory_access determines the stmt doesn't write
> > to global memory (well, yes!), but it fails to add the actual arguments
> > to the set of clobbered vars.
> >
> > Honza, can you please look into this?  Seems to be caused
> > by g:008e7397dad971c03c08fc1b0a4a98fddccaaed8
> >
> > Richard.
> >
> > > Thanks,
> > > Richard.
> > >
> > > > Setting ASSERTS to 0 gives:
> > > >
> > > > FAIL: gcc.c-torture/execute/memcpy-1.c   -O2  execution test
> > > > FAIL: gcc.c-torture/execute/memcpy-1.c   -O2 -flto 
> > > > -fno-use-linker-plugin -flto-partition=none  execution test
> > > > FAIL: gcc.c-torture/execute/memcpy-1.c   -O2 -flto -fuse-linker-plugin 
> > > > -fno-fat-lto-objects  execution test
> > > > FAIL: gcc.c-torture/execute/memcpy-1.c   -O3 -fomit-frame-pointer 
> > > > -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> > > > FAIL: gcc.c-torture/execute/memcpy-1.c   -O3 -g  execution test
> > > > FAIL: gcc.c-torture/execute/memcpy-1.c   -Os  execution test
> > > > FAIL: gcc.c-torture/execute/memset-3.c   -O3 -fomit-frame-pointer 
> > > > -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> > > > FAIL: gcc.c-torture/execute/memset-3.c   -O3 -g  execution test
> > > >
> > > > # of expected passes            23138
> > > > # of unexpected failures        8
> > > > # of unsupported tests          24
> > > >
> > > > (same for -m32).
> > > >
> > > > Originally I saw it as a bootstrap failure building stage 2
> > > > build-side libcpp, due to a bogus uninitialised warning.
> > > >
> > > > But alongside the memset and free examples, there's:
> > > >
> > > >       /* All the following functions do not return pointers, do not
> > > >          modify the points-to sets of memory reachable from their
> > > >          arguments and do not add to the ESCAPED solution.  */
> > > >       case BUILT_IN_SINCOS:
> > > >       case BUILT_IN_SINCOSF:
> > > >       ...
> > > >         return true;
> > > >
> > > > (always an empty set), whereas check_fnspec takes errno into account.
> > > >
> > > > > I wonder if, at this point, we should split the patch up to do the
> > > > > trivial move of the ao_ref_base dependent and volatile checks
> > > > > and leave this more complicated detail for a second patch?
> > > >
> > > > Yeah, that'd work, but in practice it was the PT part that was the
> > > > high-value part.  IIRC the early base checks caught less than 1% of
> > > > cases (I can rerun tonight to get exact numbers).
> > > >
> > > > Thanks,
> > > > Richard
> > > >

Re: [PATCH] alias: Optimise call_may_clobber_ref_p

Reply via email to