On Tue, 16 Aug 2022, Aldy Hernandez wrote: > On Tue, Aug 16, 2022 at 11:18 AM Richard Biener <rguent...@suse.de> wrote: > > > > On Mon, 15 Aug 2022, Aldy Hernandez wrote: > > > > > On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod <amacl...@redhat.com> > > > wrote: > > > > > > > > heh. or just > > > > > > > > > > > > + int_range<2> r; > > > > + if (!fold_range (r, const_cast <gcond *> (cond_stmt)) > > > > + || !r.singleton_p (&val)) > > > > > > > > > > > > if you do not provide a range_query to any of the fold_using_range code, > > > > it defaults to: > > > > > > > > fur_source::fur_source (range_query *q) > > > > { > > > > if (q) > > > > m_query = q; > > > > else if (cfun) > > > > m_query = get_range_query (cfun); > > > > else > > > > m_query = get_global_range_query (); > > > > m_gori = NULL; > > > > } > > > > > > > > > > Sweet. Even better! > > > > So when I do the following incremental change ontop of the posted > > patch then I see that the path query is able to simplify more > > "single BB paths" than the global range folding. > > > > diff --git a/gcc/tree-ssa-threadbackward.cc > > b/gcc/tree-ssa-threadbackward.cc > > index 669098e4ec3..777e778037f 100644 > > --- a/gcc/tree-ssa-threadbackward.cc > > +++ b/gcc/tree-ssa-threadbackward.cc > > @@ -314,6 +314,12 @@ back_threader::find_taken_edge_cond (const > > vec<basic_block> &path, > > { > > int_range_max r; > > > > + int_range<2> rf; > > + if (path.length () == 1) > > + { > > + fold_range (rf, cond); > > + } > > + > > m_solver->compute_ranges (path, m_imports); > > m_solver->range_of_stmt (r, cond); > > > > @@ -325,6 +331,8 @@ back_threader::find_taken_edge_cond (const > > vec<basic_block> &path, > > > > if (r == true_range || r == false_range) > > { > > + if (path.length () == 1) > > + gcc_assert (r == rf); > > edge e_true, e_false; > > basic_block bb = gimple_bb (cond); > > extract_true_false_edges_from_block (bb, &e_true, &e_false); > > > > Even doing the following (not sure what's the difference and in > > particular expense over the path range query) results in missed > > simplifications (checking my set of cc1files). > > > > diff --git a/gcc/tree-ssa-threadbackward.cc > > b/gcc/tree-ssa-threadbackward.cc > > index 669098e4ec3..1d43a179d08 100644 > > --- a/gcc/tree-ssa-threadbackward.cc > > +++ b/gcc/tree-ssa-threadbackward.cc > > @@ -99,6 +99,7 @@ private: > > > > back_threader_registry m_registry; > > back_threader_profitability m_profit; > > + gimple_ranger *m_ranger; > > path_range_query *m_solver; > > > > // Current path being analyzed. > > @@ -146,12 +147,14 @@ back_threader::back_threader (function *fun, > > unsigned flags, bool first) > > // The path solver needs EDGE_DFS_BACK in resolving mode. > > if (flags & BT_RESOLVE) > > mark_dfs_back_edges (); > > - m_solver = new path_range_query (flags & BT_RESOLVE); > > + m_ranger = new gimple_ranger; > > + m_solver = new path_range_query (flags & BT_RESOLVE, m_ranger); > > } > > Passing an allocated ranger here results in less simplifications over > letting path_range_query allocate its own? That's not right. Or do > you mean that using fold_range() with the m_ranger causes ICEs with > your patch (due to the non-null processing described below)?
Yes, I've needed a ranger to use fold_range (..., m_ranger) which I thought might do more than not passing one. > > > > back_threader::~back_threader () > > { > > delete m_solver; > > + delete m_ranger; > > > > loop_optimizer_finalize (); > > } > > @@ -314,6 +317,12 @@ back_threader::find_taken_edge_cond (const > > vec<basic_block> &path, > > { > > int_range_max r; > > > > + int_range<2> rf; > > + if (path.length () == 1) > > + { > > + fold_range (rf, cond, m_ranger); > > + } > > + > > m_solver->compute_ranges (path, m_imports); > > m_solver->range_of_stmt (r, cond); > > > > @@ -325,6 +334,8 @@ back_threader::find_taken_edge_cond (const > > vec<basic_block> &path, > > > > if (r == true_range || r == false_range) > > { > > + if (path.length () == 1) > > + gcc_assert (r == rf); > > edge e_true, e_false; > > basic_block bb = gimple_bb (cond); > > extract_true_false_edges_from_block (bb, &e_true, &e_false); > > > > one example is > > > > <bb 176> [local count: 14414059]: > > _128 = node_177(D)->typed.type; > > pretmp_413 = MEM[(const union tree_node *)_128].base.code; > > _431 = pretmp_413 + 65519; > > if (_128 == 0B) > > goto <bb 199>; [18.09%] > > else > > goto <bb 177>; [81.91%] > > > > where m_imports for the path is just _128 and the range computed is > > false while the ranger query returns VARYING. But > > path_range_query::range_defined_in_block does > > > > if (bb && POINTER_TYPE_P (TREE_TYPE (name))) > > m_ranger->m_cache.m_exit.maybe_adjust_range (r, name, bb); > > > > which adjusts the range to ~[0, 0], probably because of the > > dereference in the following stmt. > > > > Why does fold_range not do this when folding the exit test? Is there > > a way to make it do so? It looks like the only routine using this > > in gimple-range.cc is range_on_edge and there it's used for e->src > > after calling range_on_exit for e->src (why's it not done in > > range_on_exit?). A testcase for this is > > Andrew's gonna have to answer this one, because I'm just a user of the > infer_range infrastructure. But yes, you're right... fold_range > doesn't seem to take into account side-effects such as non-null. OK, let's see. I can probably use the path query as well - I'll see to produce a prototype of doing those simplifications early, avoiding threadings there and through unreachable parts of the CFG. Richard. > Aldy > > > > > int foo (int **p, int i) > > { > > int *q = *p; > > int res = *q + i; > > if (q) > > return res; > > return -1; > > } > > > > which we "thread" with the path and with the above ICEs because > > fold_range doesn't get that if (q) is always true. Without the > > patch ethread doesn't want to duplicate the block (it's too large) > > but threadfull will if you disable evrp (if you remove the increment > > by 'i' it again won't since nothing interesting prevails and it > > won't go to BB 0 and fails to pick up a thread of length > 1): > > > > Checking profitability of path (backwards): bb:2 (6 insns) bb:0 > > Control statement insns: 2 > > Overall: 4 insns > > [1] Registering jump thread: (0, 2) incoming edge; (2, 3) nocopy; > > path: 0->2->3 SUCCESS > > Removing basic block 2 > > ;; basic block 2, loop depth 0 > > ;; pred: > > _1 = *p_6(D); > > _2 = (long unsigned int) n_7(D); > > _3 = _2 * 4; > > q_8 = _1 + _3; > > res_9 = *q_8; > > if (q_8 != 0B) > > goto <bb 3>; [98.33%] > > else > > goto <bb 4>; [1.67%] > > ;; succ: 3 > > ;; 4 > > > > ... (it copied BB 2) ... > > > > int foo (int * * p, int n) > > { > > int res; > > int * q; > > int * _10; > > long unsigned int _11; > > long unsigned int _12; > > > > <bb 2> [local count: 1073741824]: > > _10 = *p_6(D); > > _11 = (long unsigned int) n_7(D); > > _12 = _11 * 4; > > q_13 = _10 + _12; > > res_14 = *q_13; > > return res_14; > > > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)