RE: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

Di Zhao OS via Gcc-patches Mon, 15 Nov 2021 09:24:33 -0800

Attached is the updated patch. Fixed some errors in testcases.

> -----Original Message-----
> From: Richard Biener <richard.guent...@gmail.com>
> Sent: Wednesday, November 10, 2021 5:44 PM
> To: Di Zhao OS <diz...@os.amperecomputing.com>
> Cc: gcc-patches@gcc.gnu.org; Andrew MacLeod <amacl...@redhat.com>
> Subject: Re: [PATCH v2] tree-optimization/101186 - extend FRE with
> "equivalence map" for condition prediction
> 
> On Sun, Oct 24, 2021 at 9:03 PM Di Zhao OS
> <diz...@os.amperecomputing.com> wrote:
> >
> > Hi,
> >
> > Attached is a new version of the patch, mainly for improving performance
> > and simplifying the code.
> 
> The patch doesn't apply anymore, can you update it please?
> 
> I see the new ssa-fre-101.c test already passing without the patch.


It was a mistake in test ssa-fre-101.c::g to define the variables with
the unsigned integers, in this way "a >= 0" is always true. After modified
the case, now fre1 in the patch can remove unreachable call "foo ()", and
evrp on trunk does not.

> Likewise ssa-fre-100.c and ssa-fre-102.c would PASS if you scan
> the pass dump after fre1 which is evrp so it seems that evrp already
> handles the equivalences (likely with the relation oracle) now?
> I'm sure there are second order effects when eliminating conditions
> in FRE but did you re-evaluate what made you improve VN to see
> if the cases are handled as expected now without this change?

In case ssa-fre-102.c, the unreachable call "foo ()" can be removed by
evrp, but fre in the patch can additionally replace "g = x + b" with
"g = 99". (Again I'm sorry the regex to check this was wrong..)

Test cases to simulate the original problem I found are moved into
gcc.dg/tree-ssa/ssa-pre-34.c. The unreachable calls to "foo ()" are
still removed by pre with the patch. (If compiled with -O3, the
"foo ()"s in test file can now be removed by thread2/threadfull2 and
dom3 on trunk. This relies on jump threading across the loops, so
even with -O3, similar cases may not get optimized say if there're
too many statements to copy.)

Thanks,
Di Zhao

> 
> I will still look at and consider the change btw, but given the EVRP
> improvements I'm also considering to remove the predication
> support from VN alltogether.  At least in the non-iterating mode
> it should be trivially easy to use rangers relation oracle to simplify
> predicates.  For the iterating mode it might not be 100% effective
> since I'm not sure we can make it use the current SSA values and
> how it would behave with those eventually changing to worse.
> 
> Andrew, how would one ask the relation oracle to simplify a
> condition?  Do I have to do any bookkeeping to register
> predicates on edges for it?
> 
> Thanks,
> Richard.
> 
> > First, regarding the comments:
> >
> > > -----Original Message-----
> > > From: Richard Biener <richard.guent...@gmail.com>
> > > Sent: Friday, October 1, 2021 9:00 PM
> > > To: Di Zhao OS <diz...@os.amperecomputing.com>
> > > Cc: gcc-patches@gcc.gnu.org
> > > Subject: Re: [PATCH v2] tree-optimization/101186 - extend FRE with
> > > "equivalence map" for condition prediction
> > >
> > > On Thu, Sep 16, 2021 at 8:13 PM Di Zhao OS
> > > <diz...@os.amperecomputing.com> wrote:
> > > >
> > > > Sorry about updating on this after so long. It took me much time to 
> > > > work out a
> > > > new plan and pass the tests.
> > > >
> > > > The new idea is to use one variable to represent a set of equal 
> > > > variables at
> > > > some basic-block. This variable is called a "equivalence head" or 
> > > > "equiv-head"
> > > > in the code. (There's no-longer a "equivalence map".)
> > > >
> > > > - Initially an SSA_NAME's "equivalence head" is its value number. 
> > > > Temporary
> > > >   equivalence heads are recorded as unary NOP_EXPR results in the 
> > > > vn_nary_op_t
> > > >   map. Besides, when inserting into vn_nary_op_t map, make the new 
> > > > result at
> > > >   front of the vn_pval list, so that when searching for a variable's
> > > >   equivalence head, the first result represents the largest equivalence 
> > > > set at
> > > >   current location.
> > > > - In vn_ssa_aux_t, maintain a list of references to valid_info->nary 
> > > > entry.
> > > >   For recorded equivalences, the reference is result->entry; for normal 
> > > > N-ary
> > > >   operations, the reference is operand->entry.
> > > > - When recording equivalences, if one side A is constant or has more 
> > > > refs, make
> > > >   it the new equivalence head of the other side B. Traverse B's 
> > > > ref-list,if a
> > > >   variable C's previous equiv-head is B, update to A. And re-insert B's 
> > > > n-ary
> > > >   operations by replacing B with A.
> > > > - When inserting and looking for the results of n-ary operations, 
> > > > insert and
> > > >   lookup by the operands' equiv-heads.
> > > > ...
> > > >
> > > > Thanks,
> > > > Di Zhao
> > > >
> > > > --------
> > > > Extend FRE with temporary equivalences.
> > >
> > > Comments on the patch:
> > >
> > > +  /* nary_ref count.  */
> > > +  unsigned num_nary_ref;
> > > +
> > >
> > > I think a unsigned short should be enough and that would nicely
> > > pack after value_id together with the bitfield (maybe change that
> > > to unsigned short :1 then).
> >
> > Changed num_nary_ref to unsigned short and moved after value_id.
> >
> > > @@ -7307,17 +7839,23 @@ process_bb (rpo_elim &avail, basic_block bb,
> > >             tree val = gimple_simplify (gimple_cond_code (last),
> > >                                         boolean_type_node, lhs, rhs,
> > >                                         NULL, vn_valueize);
> > > +           vn_nary_op_t vnresult = NULL;
> > >             /* If the condition didn't simplfy see if we have recorded
> > >                an expression from sofar taken edges.  */
> > >             if (! val || TREE_CODE (val) != INTEGER_CST)
> > >               {
> > > -               vn_nary_op_t vnresult;
> > >
> > > looks like you don't need vnresult outside of the if()?
> >
> > vnresult is reused later to record equivalences generated by PHI nodes.
> >
> > > +/* Find predicated value of vn_nary_op by the operands' equivalences. 
> > > Return
> > > + * NULL_TREE if no known result is found.  */
> > > +
> > > +static tree
> > > +find_predicated_value_by_equivs (vn_nary_op_t vno, basic_block bb,
> > > +                                vn_nary_op_t *vnresult)
> > > +{
> > > +  lookup_equiv_heads (vno->length, vno->op, vno->op, bb);
> > > +  tree result
> > > +    = simplify_nary_op (vno->length, vno->opcode, vno->op, vno->type);
> > >
> > > why is it necessary to simplify here?  It looks like the caller
> > > already does this.
> >
> > In the new patch, changed the code a little to remove redundant calculation.
> >
> > > I wonder whether it's valid to always perform 
> > > find_predicated_value_by_equivs
> > > from inside vn_nary_op_get_predicated_value instead of bolting it to only
> > > a single place?
> >
> > Removed function find_predicated_value_by_equivs and inlined the code.
> >
> > Because lookup_equiv_head uses vn_nary_op_get_predicated_value, so I left
> > vn_nary_op_get_predicated_value unchanged. Instead, operands are set to
> > equiv-heads in init_vn_nary_op_from_stmt. So altogether, predicates are 
> > always
> > inserted and searched by equiv-heads.
> >
> > > +
> > > +static vn_nary_op_t
> > > +val_equiv_insert (tree op1, tree op2, edge e)
> > > +{
> > >
> > > +  if (is_gimple_min_invariant (lhs))
> > > +    std::swap (lhs, rhs);
> > > +  if (is_gimple_min_invariant (lhs) || TREE_CODE (lhs) != SSA_NAME)
> > > +    /* Possible if invoked from record_equiv_from_previous_cond.  */
> > > +    return NULL;
> > >
> > > Better formulate all of the above in terms of only SSA_NAME checks 
> > > since...
> > >
> > > +  /* Make the hand-side with more recorded n-ary expressions new
> > > +   * equivalence-head, to make fewer re-insertions.  */
> > > +  if (TREE_CODE (rhs) == SSA_NAME
> > > +      && VN_INFO (rhs)->num_nary_ref < VN_INFO (lhs)->num_nary_ref)
> > > +    std::swap (lhs, rhs);
> > >
> > > here LHS needs to be an SSA_NAME.
> >
> > Tried to fix this in the new patch.
> >
> > > +  /* Record equivalence as unary NOP_EXPR.  */
> > > +  vn_nary_op_t val
> > > +    = vn_nary_op_insert_pieces_predicated_1 (1, NOP_EXPR, TREE_TYPE 
> > > (lhs),
> > > +                                            &lhs, rhs, 0, bb);
> > >
> > > so you are recording an equivalence of lhs to rhs as (NOP_EXPR)lhs
> > > with value rhs?
> > > Presumably you think that's good enough to not overlap with entries
> > > for "real" IL?
> >
> > By checking related codes, it seems to me there's no overlap as long as
> > equivalences are inserted and searched as predicated_values. It should be 
> > easy
> > to replace this with other preferred opcode.
> >
> > > In particular why did you choose to _not_ use (2, EQ_EXPR, 
> > > boolean_type_node,
> > > &[lhs, rhs], boolean_true_node, ...) like I think what would be
> > > inserted for by the
> > > existing code in process_bb via vn_nary_op_insert_pieces_predicated?
> > > It would probably felt more natural if the "new" code bolted on the case 
> > > where
> > > that is called with an equivalence?
> >
> > The "equivalence head"s actually implement Disjoint-sets. So by a hashtable
> > search in the nary-map, the set of equivalences of a variable can be found.
> > Also it's easy to find out whether two variables are equal. And the
> > disjoint-sets are path-compressed by updating "equivalence head" and
> > re-inserting predicates.
> >
> > In the new attached patch, predicates like "A==B is true" and "A!=B is 
> > false"
> > are not inserted, as they can be computed from equivalences. Same with the
> > predicates derived from them. This reduces insertions of predicates by a 
> > lot (
> > counting the equivalences themselves and re-inserted predicates). Below are
> > test results on SPEC intrate, comparing with trunk (f45610a4, Oct 19):
> >
> >               |more bb |more bb |more stmt|more stmt|less      |less
> >               |removed |removed |removed  |removed  |predicates|predicates
> >               |at fre1 |at fre1 |at fre1  |at fre1  |inserted  |inserted
> > --------------------------------------------------------------------------
> > 500.perlbench_r| 64    | 1.56%  | 102    | 0.18%   | 51928    | 53.44%
> > 502.gcc_r      | 679   | 4.88%  | 290    | 0.21%   | 149968   | 65.5%
> > 505.mcf_r      | 5     | 35.71% | 9      | 1.40%   | 349      | 27.48%
> > 520.omnetpp    | 132   | 5.36%  | 48     | 0.14%   | 28192    | 58.38%
> > 523.xalancbmk_r| 231   | 3.15%  | 308    | 0.35%   | 65131    | 58.95%
> > 525.x264_r     | 4     | 1.36%  | 27     | 0.11%   | 10401    | 40.68%
> > 531.deepsjeng_r| 1     | 3.45%  | 2      | 0.14%   | 1412     | 53.81%
> > 541.leela_r    | 2     | 0.62%  | 5      | 0.06%   | 3874     | 48.56%
> > 548.exchange2_r| 0     | 0.00%  | 3      | 0.04%   | 159      | 3.33%
> > 557.xz_r       | 0     | 0.00%  | 3      | 0.07%   | 1894     | 52.39%
> >
> > Considering the copying and unwinding work saved, this patch could reduce
> > compile time rather than increase it.
> >
> > > Btw, I didn't find any condition that prevents equivalences being used for
> > > non-integer types?  Are you sure this is all valid for FP compares without
> > > further restrictions?
> >
> > I think this approach follows sccvn's previous decisions on FP comparisons,
> > for equivalences are inserted only when EQ_EXPRs were evaluated to be true.
> > The patch is tested on regression tests and SPEC fprate cases.
> >
> > >
> > > @@ -4224,36 +4404,345 @@ vn_nary_op_insert_pieces_predicated (unsigned
> > > int length, enum tree_code code,
> > >        fprintf (dump_file, " == %s\n",
> > >                integer_zerop (result) ? "false" : "true");
> > >      }
> > > -  vn_nary_op_t vno1 = alloc_vn_nary_op (length, NULL_TREE, value_id);
> > > -  init_vn_nary_op_from_pieces (vno1, length, code, type, ops);
> > > -  vno1->predicated_values = 1;
> > > -  vno1->u.values = (vn_pval *) obstack_alloc (&vn_tables_obstack,
> > > -                                             sizeof (vn_pval));
> > > -  vno1->u.values->next = NULL;
> > > -  vno1->u.values->result = result;
> > > -  vno1->u.values->n = 1;
> > > -  vno1->u.values->valid_dominated_by_p[0] = pred_e->dest->index;
> > > -  return vn_nary_op_insert_into (vno1, valid_info->nary, true);
> > > +  /* Insert by operands' equiv-heads.  */
> > > +  tree new_ops[3];
> > > +  lookup_equiv_heads (length, ops, new_ops, pred_e->dest);
> > > +  /* Skip if the result is known.  */
> > > +  tree simplified = simplify_nary_op (length, code, new_ops, type);
> > > +  if (simplified)
> > > +    {
> > > +      if (dump_file && (dump_flags & TDF_DETAILS))
> > > +       {
> > > +         fprintf (dump_file, "Result is known: ");
> > > +         print_generic_expr (dump_file, simplified, TDF_NONE);
> > > +         fprintf (dump_file, ", skipped.\n");
> > > +       }
> > > +      return NULL;
> > > +    }
> > >
> > > soo - how do the existing predicated expressions change to be handled 
> > > here?
> > > Why would they ever be simplified?  Should this simplification not better 
> > > happen
> > > upthread when visiting the conditional?  Maybe that needs to use the 
> > > equivs
> > > already?
> >
> > This happens when updating variable A's equiv_head from B to C, and
> > re-inserting former predicates of B with C. For example, "x < 0" is 
> > recorded,
> > then x is assigned with new equiv_head -1 at some location, then -1 < 0 can
> > simplify. It shouldn't happen when inserting new predicates.
> >
> > >
> > > You re-use the predicated value list which I think is great, can you 
> > > explain
> > > how "latest" is meant in the following?
> > >
> > > @@ -4145,7 +4264,12 @@ vn_nary_op_insert_into (vn_nary_op_t vno,
> > > vn_nary_op_table_type *table,
> > >               next = &(*next)->next;
> > >             }
> > >           if (!found)
> > > -           *next = nval;
> > > +           {
> > > +             /* Insert new value at the front, so that lookup_equiv_head 
> > > can
> > > +              * find the latest result.  */
> > > +             nval->next = vno->u.values;
> > > +             vno->u.values = nval;
> > > +           }
> >
> > It meant the "most general" result. (I picked the wrong word. The 
> > description
> > is changed in the new patch file. )
> >
> > There can be multiple valid results at the same time. As we traverse in RPO 
> > and
> > update equiv-heads (i.e. unite disjoint-sets), the last inserted result
> > represents the most complete set of equivalences of the variable.
> >
> > (In the last patch version, when "Appending predicate to value", the new 
> > result
> > is not moved to the front. That violates the property of equiv-heads and 
> > some
> > opportunities will be lost. Fixed this in the new patch file.)
> >
> > >
> > > OK, I have to stop now but will try to understand and review further next 
> > > week.
> > > Sorry for the delays but the patch is quite complex to understand :/
> > >
> > > Richard.
> > >
> > > > 2021-09-16  Di Zhao  <diz...@os.amperecomputing.com>
> > > >
> >
> > Other changes:
> > - Fixed some errs in comments.
> > - On recording temporary equivalences generated by PHI nodes, altered the
> >   logic, hope it can be a bit more clear.
> >   1. When a conditional expression's true/false edge E is assessed to be
> >      executable, search for conflicting predicates for E to be taken. If a
> >      conflicting predicate is found at L, and L's single predecessor P
> >      dominates current BB, then we know P->L cannot be taken if E is taken.
> >   2. Process phi_bb (immediately dominated by P, and dominates BB), rule out
> >      PHI arguments (x2) that can only come from P->L. If there's a single
> >      PHI argument (x1) left, record the equivalence of x1 and PHI result x
> >      at edge E.
> >           -----
> >           | P |
> >           -----
> >            |  \
> >            |   -----
> >            |   | L |  <== conflicting predicate's location
> >            |   -----
> >            |    /
> >   -------------------------
> >   | x = PHI<x1(P), x2(L)> | phi_bb
> >   -------------------------
> >              |
> >             ....
> >              |
> >            -----
> >            | BB | <== current bb
> >            -----
> >           /     \ edge E
> >         ...     -------
> >                 | dest |
> >                 -------
> >
> > - Fixed a correctness problem when recording equivalences from PHI in 
> > previous
> >   version. Also test case ssa-fre-95.c:k in previous version was incorrect,
> >   fixed that.
> >
> > Thanks,
> > Di Zhao

---

Extend FRE with temporary equivalences.

2021-11-15  Di Zhao  <diz...@os.amperecomputing.com>

gcc/ChangeLog:
        PR tree-optimization/101186
        * tree-ssa-sccvn.c (VN_INFO): remove assertions (there could be a
        predicate already).
        (dominated_by_p_w_unex): Moved upward.
        (vn_nary_op_get_predicated_value): Moved upward.
        (is_vn_valid_at_bb): Check if vn_pval is valid at BB.
        (lookup_equiv_head): Lookup the "equivalence head" of given node.
        (lookup_equiv_heads): Lookup the "equivalence head"s of given nodes.
        (vn_tracking_edge): Extracted utility function.
        (init_vn_nary_op_from_stmt): Insert and lookup by "equivalence head"s.
        (vn_nary_op_insert_into): Insert new value at the front.
        (vn_nary_op_insert_pieces_predicated_1): Insert as predicated values
        from pieces.
        (fold_const_from_equiv_heads): Fold N-ary expression of equiv-heads.
        (push_new_nary_ref): Insert a back-reference to vn_nary_op_t.
        (val_equiv_insert): Record temporary equivalence.
        (vn_nary_op_insert_pieces_predicated): Record equivalences instead of
        some predicates; insert back-refs.
        (record_equiv_from_prev_phi_1): Record temporary equivalences generated
        by PHI nodes.
        (record_equiv_from_prev_phi): Given an outgoing edge of a conditional
        expression taken, record equivalences generated by PHI nodes.
        (visit_nary_op): Add lookup previous results of N-ary operations by
        equivalences.
        (insert_related_predicates_on_edge): Some predicates can be computed
        from equivalences, no need to insert them.
        (process_bb): Add lookup predicated values by 
equivalenc~/tmp/time-frees.
        (struct unwind_state): Unwind state of back-refs to vn_nary_op_t.
        (do_unwind): Unwind the back-refs to vn_nary_op_t.
        (do_rpo_vn): Update back-reference unwind state.
        * tree-ssa-sccvn.h (struct nary_ref): hold a lists of references to the
        nary map entries.

gcc/testsuite/ChangeLog:

        * gcc.dg/tree-ssa/pr68619-2.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-1.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-2.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-3.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-5.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-7.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-8.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-9.c: Disable fre.
        * gcc.dg/tree-ssa/vrp03.c: Disable fre.
        * gcc.dg/tree-ssa/ssa-fre-100.c: New test.
        * gcc.dg/tree-ssa/ssa-fre-101.c: New test.
        * gcc.dg/tree-ssa/ssa-fre-102.c: New test.
        * gcc.dg/tree-ssa/ssa-pre-34.c: New test.

v4-tree-optimization-101186.patch
Description: v4-tree-optimization-101186.patch

RE: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

Reply via email to