On Mon, Dec 6, 2021 at 4:45 PM Jan Hubicka <hubi...@kam.mff.cuni.cz> wrote: > > > On Mon, Dec 6, 2021 at 4:03 PM Richard Biener > > <richard.guent...@gmail.com> wrote: > > > > > > On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford > > > <richard.sandif...@arm.com> wrote: > > > > > > > > Richard Biener <richard.guent...@gmail.com> writes: > > > > > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches > > > > > <gcc-patches@gcc.gnu.org> wrote: > > > > >> > > > > >> When compiling an optabs.ii at -O2 with a release-checking build, > > > > >> we spent a lot of time in call_may_clobber_ref_p. One of the > > > > >> first things the function tries is the get_modref_function_summary > > > > >> approach, but that also seems to be the most expensive check. > > In general it should not be that expensive since the trees are generally > small (despite the 3 nested loops making them look dangerously big) and > most of time we resort to simple pointer/array lookups. > > From lcov stats https://splichal.eu/lcov/gcc/tree-ssa-alias.c.gcov.html > (which are quite useful when trying to understand typical behaviour of > the alias oracle) > > In call_may_clobber_ref_p we get > > 2980 : 256141790 : callee = gimple_call_fndecl (call); > 2981 : : > 2982 : 256141790 : if (callee != NULL_TREE && > !ref->volatile_p) > 2983 : : { > 2984 : 240446150 : struct cgraph_node *node = > cgraph_node::get (callee); > this is direct pointer from node > 2985 : 240446150 : if (node) > 2986 : : { > 2987 : 240234208 : modref_summary *summary = > get_modref_function_summary (node); > This is fast summary so it is array lookup > 2988 : 240234208 : if (summary) > 2989 : : { > 2990 : 19732392 : if > (!modref_may_conflict (call, summary->stores, ref, tbaa_p) > > And we get here in 8% of invocations > > And in modref_may_conflict we have: > > > 21147836 : modref_may_conflict (const gcall *stmt, > 2552 : : modref_tree > <alias_set_type> *tt, ao_ref *ref, bool tbaa_p) > 2553 : : { > 2554 : 21147836 : alias_set_type base_set, ref_set; > 2555 : : > 2556 : 21147836 : if (tt->every_base) > 2557 : : return true; > 2558 : : > 2559 : 3407876 : if (!dbg_cnt (ipa_mod_ref)) > 2560 : : return true; > 2561 : : > 2562 : 3407876 : base_set = ao_ref_base_alias_set > (ref); > 2563 : : > 2564 : 3407876 : ref_set = ao_ref_alias_set (ref); > All direct lookups in most cases > 2565 : : > 2566 : 3407876 : int num_tests = 0, max_tests = > param_modref_max_tests; > 2567 : 14479077 : for (auto base_node : tt->bases) > 2568 : : { > 2569 : 6171739 : if (tbaa_p && > flag_strict_aliasing) > At average there are 2 bases travelled by the loop. > 2570 : : { > 2571 : 5253690 : if (num_tests >= max_tests) > 2572 : : return true; > 2573 : 5253690 : alias_stats.modref_tests++; > 2574 : 5253690 : if (!alias_sets_conflict_p > (base_set, base_node->base)) > alias set checks cheecks are also reasonably cheap and 50% of time avoids > walking rest of the tree > 2575 : 2238398 : continue; > 2576 : 3015292 : num_tests++; > 2577 : : } > 2578 : : > 2579 : 3933341 : if (base_node->every_ref) > > We hit the loop only 0.15 times per invocation of the function > > 2580 : : return true; > 2581 : : > 2582 : 14943624 : for (auto ref_node : > base_node->refs) > 2583 : : { > 2584 : : /* Do not repeat same test > as before. */ > 2585 : 4381325 : if ((ref_set != base_set || > base_node->base != ref_node->ref) > > and usually visit only bit more htan one refs > 2586 : 2548127 : && tbaa_p && > flag_strict_aliasing) > 2587 : : { > 2588 : 1840866 : if (num_tests >= > max_tests) > 2589 : : return true; > 2590 : 1809594 : > alias_stats.modref_tests++; > 2591 : 1809594 : if > (!alias_sets_conflict_p (ref_set, ref_node->ref)) > 2592 : 314793 : continue; > 2593 : 1494801 : num_tests++; > 2594 : : } > 2595 : : > 2596 : 4035260 : if (ref_node->every_access) > 2597 : : return true; > 2598 : : > 2599 : : /* TBAA checks did not > disambiguate, try individual accesses. */ > 2600 : 13186541 : for (auto access_node : > ref_node->accesses) > > 2601 : : { > 2602 : 3770816 : if (num_tests >= > max_tests) > > Again this loop usually executes only once > 2603 : 463332 : return true; > 2604 : : > 2605 : 3770816 : tree arg = > access_node.get_call_arg (stmt); > > here more expensive work starts (we will do ao ref oracle matching > but only in 0.1 times per invocation. > > So in general it should not be too bad as I am also watching number of > querries produced by this loop in the stats. > > However moving PTA tests ahead is probably a good idea (I guess they > resort to bitmap operaitons that also do some looping). > > > > 3786 if (stmt_may_clobber_ref_p_1 (def_stmt, ref, tbaa_p)) > > > > (gdb) call debug (def_stmt) > > > > __builtin_memset (p_3(D), 0, 28); > > > > (gdb) call debug (ref.base) > > > > *p_3(D) > > > > (gdb) call pt_solution_includes (&((gcall *) def_stmt)->call_clobbered, > > > > ref.base) > > > > $3 = false > > > > > > huh, well - *p_3(D) isn't a DECL and pt_solution_includes just works for > > > decls. > > > In fact it should eventually ICE even ;) That said, nothing should > > > really iniitalize > > > the call_clobber set besides pt_solution_reset at GIMPLE stmt allocation > > > time > > > or IPA PTA when enabled. pt_solutuon_reset sets ->anything to true so the > > > function should return true ... > > Even before my patch we computed call clobbered. handle_rhs_call sets > call clobbers for non-readonly nonescape args via get_call_clobber_vi > and later in compute_points_to_sets this info is copied into > gimple_call_clobbered_set: > > pt = gimple_call_clobber_set (stmt); > if (gimple_call_flags (stmt) & (ECF_CONST|ECF_PURE|ECF_NOVOPS)) > memset (pt, 0, sizeof (struct pt_solution)); > else if ((vi = lookup_call_clobber_vi (stmt)) != NULL) > { > *pt = find_what_var_points_to (cfun->decl, vi); > /* Escaped (and thus nonlocal) variables are always > implicitly clobbered by calls. */ > /* ??? ESCAPED can be empty even though NONLOCAL > always escaped. */ > pt->nonlocal = 1; > pt->escaped = 1; > } > else > { > /* If there is nothing special about this call then > we have made everything that is used also escape. */ > *pt = cfun->gimple_df->escaped; > pt->nonlocal = 1; > } > > I think you are hitting the following: > > static bool > find_func_aliases_for_builtin_call (struct function *fn, gcall *t) > { > tree fndecl = gimple_call_fndecl (t); > auto_vec<ce_s, 2> lhsc; > auto_vec<ce_s, 4> rhsc; > varinfo_t fi; > > if (gimple_call_builtin_p (t, BUILT_IN_NORMAL)) > /* ??? All builtins that are handled here need to be handled > in the alias-oracle query functions explicitly! */ > switch (DECL_FUNCTION_CODE (fndecl)) > { > /* All the following functions return a pointer to the same object > as their first argument points to. The functions do not add > to the ESCAPED solution. The functions make the first argument > pointed to memory point to what the second argument pointed to > memory points to. */ > case BUILT_IN_STRCPY: > > Notice the ??? comment. The code does not set clobbers here because it > assumes that tree-ssa-alias will do the right thing. > So one may make builtins handling first, PTA next and only if both say > "may alias" continue. Other option is to extend the code here to add > propert clobber/use PTA to make things more regular. I would be happy > to help with that.
Or avoid setting the gimple_call_clobber/use sets for builtins handled in the above way. As said, the info there is currently incorrect. Note that originally the call use/clobber vi were only added in IPA mode. I suppose you now add them unconditionally (I forgot about all the changes). Richard. > > Honza > > > > > > It looks like Honza added computation of these sets at some point, also > > > maybe > > > forgetting to set some bits here. > > > > > > I prefer to investigate / fix this before refactoring the uses in this > > > way. Can > > > you file a bugreport please? I can reproduce the memcpy FAIL with > > > just moving check_fnspec down. > > > > So the issue seems that for > > # .MEM_22 = VDEF <.MEM_19> > > memcpy (&dst, &src, 1024); > > > > determine_global_memory_access determines the stmt doesn't write > > to global memory (well, yes!), but it fails to add the actual arguments > > to the set of clobbered vars. > > > > Honza, can you please look into this? Seems to be caused > > by g:008e7397dad971c03c08fc1b0a4a98fddccaaed8 > > > > Richard. > > > > > Thanks, > > > Richard. > > > > > > > Setting ASSERTS to 0 gives: > > > > > > > > FAIL: gcc.c-torture/execute/memcpy-1.c -O2 execution test > > > > FAIL: gcc.c-torture/execute/memcpy-1.c -O2 -flto > > > > -fno-use-linker-plugin -flto-partition=none execution test > > > > FAIL: gcc.c-torture/execute/memcpy-1.c -O2 -flto -fuse-linker-plugin > > > > -fno-fat-lto-objects execution test > > > > FAIL: gcc.c-torture/execute/memcpy-1.c -O3 -fomit-frame-pointer > > > > -funroll-loops -fpeel-loops -ftracer -finline-functions execution test > > > > FAIL: gcc.c-torture/execute/memcpy-1.c -O3 -g execution test > > > > FAIL: gcc.c-torture/execute/memcpy-1.c -Os execution test > > > > FAIL: gcc.c-torture/execute/memset-3.c -O3 -fomit-frame-pointer > > > > -funroll-loops -fpeel-loops -ftracer -finline-functions execution test > > > > FAIL: gcc.c-torture/execute/memset-3.c -O3 -g execution test > > > > > > > > # of expected passes 23138 > > > > # of unexpected failures 8 > > > > # of unsupported tests 24 > > > > > > > > (same for -m32). > > > > > > > > Originally I saw it as a bootstrap failure building stage 2 > > > > build-side libcpp, due to a bogus uninitialised warning. > > > > > > > > But alongside the memset and free examples, there's: > > > > > > > > /* All the following functions do not return pointers, do not > > > > modify the points-to sets of memory reachable from their > > > > arguments and do not add to the ESCAPED solution. */ > > > > case BUILT_IN_SINCOS: > > > > case BUILT_IN_SINCOSF: > > > > ... > > > > return true; > > > > > > > > (always an empty set), whereas check_fnspec takes errno into account. > > > > > > > > > I wonder if, at this point, we should split the patch up to do the > > > > > trivial move of the ao_ref_base dependent and volatile checks > > > > > and leave this more complicated detail for a second patch? > > > > > > > > Yeah, that'd work, but in practice it was the PT part that was the > > > > high-value part. IIRC the early base checks caught less than 1% of > > > > cases (I can rerun tonight to get exact numbers). > > > > > > > > Thanks, > > > > Richard > > > >