Thanks,

Di

-----Original Message-----
From: Gcc-patches 
<gcc-patches-bounces+dizhao=os.amperecomputing....@gcc.gnu.org> On Behalf Of Di 
Zhao OS via Gcc-patches
Sent: Friday, September 17, 2021 2:13 AM
To: gcc-patches@gcc.gnu.org
Subject: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence 
map" for condition prediction

Sorry about updating on this after so long. It took me much time to work out a
new plan and pass the tests.

The new idea is to use one variable to represent a set of equal variables at
some basic-block. This variable is called a "equivalence head" or "equiv-head"
in the code. (There's no-longer a "equivalence map".)

- Initially an SSA_NAME's "equivalence head" is its value number. Temporary
  equivalence heads are recorded as unary NOP_EXPR results in the vn_nary_op_t
  map. Besides, when inserting into vn_nary_op_t map, make the new result at
  front of the vn_pval list, so that when searching for a variable's
  equivalence head, the first result represents the largest equivalence set at
  current location.
- In vn_ssa_aux_t, maintain a list of references to valid_info->nary entry.
  For recorded equivalences, the reference is result->entry; for normal N-ary
  operations, the reference is operand->entry.
- When recording equivalences, if one side A is constant or has more refs, make
  it the new equivalence head of the other side B. Traverse B's ref-list, if a
  variable C's previous equiv-head is B, update to A. And re-insert B's n-ary
  operations by replacing B with A.
- When inserting and looking for the results of n-ary operations, insert and
  lookup by the operands' equiv-heads.

So except for the refs in vn_ssa_aux_t, this scheme uses the original
infrastructure to its best. Quadric search time is avoided at the cost of some
re-insertions. Test results on SPEC2017 intrate (counts and percentages):

                |more bb |more bb |more stmt|more stmt|more       |more
                |removed |removed |removed  |removed  |nv_nary_ops|nv_nary_ops
                |at fre1 |at fre1 |at fre1  |at fre1  |inserted   |inserted
------------------------------------------------------------------------------
 500.perlbench_r| 64     | 1.98%  | 103     | 0.19%   | 11260     | 12.16%
 502.gcc_r      | 671    | 4.80%  | 317     | 0.23%   | 13964     | 6.09%
 505.mcf_r      | 5      | 35.71% | 9       | 1.40%   | 32        | 2.52%
 520.omnetpp    | 132    | 5.45%  | 39      | 0.11%   | 1895      | 3.91%
 523.xalancbmk_r| 238    | 3.26%  | 313     | 0.36%   | 1417      | 1.27%
 525.x264_r     | 4      | 1.36%  | 27      | 0.11%   | 1752      | 6.78%
 531.deepsjeng_r| 1      | 3.45%  | 2       | 0.14%   | 228       | 8.67%
 541.leela_r    | 2      | 0.63%  | 0       | 0.00%   | 92        | 1.14%
 548.exchange2_r| 0      | 0.00%  | 3       | 0.04%   | 49        | 1.03%
 557.xz_r       | 0      | 0.00%  | 3       | 0.07%   | 272       | 7.55%

There're more basic_blocks and statements removed compared with last
implementation, the reasons are:
1) "CONST op CONST" simplification is included. It is missed in previous patch.
2) By inserting RHS of statements on equiv-heads, more N-ary operations can be
   simplified. One example is in 'ssa-fre-97.c' in the patch file.

While jump-threading & vrp also utilize temporary equivalences (so some of the
newly removed blocks and statements can also be covered by them), I think this
patch is a supplement, in cases when jump threading cannot take place (the
original example), or value number info needs to be involved (the
'ssa-fre-97.c' example).

Fixed the former issue with non-iterate mode.

About recording the temporary equivalences generated by PHIs (i.e. the
'record_equiv_from_previous_cond' stuff), I have to admit it looks strange and
the code size is large, but I haven't find a better way yet. Consider a piece
of CFG like the one below, if we want to record x==x2 on the true edge when
processing bb1, the location (following current practice) will be bb2. But that
is not useful at bb5 or bb6, because bb2 doesn't dominate them. And I can't
find a place to record x==x1 when processing bb1.
If we can record things on edges rather than blocks, say x==x1 on 1->3 and
x==x2 on 1->2, then perhaps with an extra check for "a!=0", x2 can be a valid
equiv-head for x since bb5. But I think it lacks efficiency and is not
persuasive. It is more efficient to find a valid previous predicate when
processing bb4, because the vn_nary_op_t will be fetched anyway.
        --------------
        | if (a != 0) | bb1
        --------------
        f |      \ t
          |    -------
          |    | bb2 | 
          |    -------
          |      /
    -------------------------
    | x = PHI<x1(1), x2(2)> | bb3
    -------------------------
               |
              ....
               |
       --------------
       | if (a != 0) | bb4
       --------------
           |f     \t
    -------------  -------
bb7 | where     |  | bb5 |  ==> where "x==x2" is recorded now
    | "x==x1" is|  -------
    | recorded  |    \
    | now       |    ...
    -------------     |
                   -------
                   | bb6 |  ==> where "x==x2" needs to be used
                   -------
Although I think I can remove the 'dominator_to_phi_map' and generalize this a
little, but the major logic will be similar. So I left this unchanged for now,
and would like to hear your suggestions on the new scheme overall.

Thanks,
Di Zhao

--------
Extend FRE with temporary equivalences.

2021-09-16  Di Zhao  <diz...@os.amperecomputing.com>

gcc/ChangeLog:
        PR tree-optimization/101186
        * tree-ssa-sccvn.c (dominated_by_p_w_unex): Moved upward, no change.
        (vn_nary_op_get_predicated_value): Moved upward, no change.
        (is_vn_valid_at_bb): Check if vn_pval is valid at BB.
        (lookup_equiv_head): Lookup the "equivalence head" of given node.
        (lookup_equiv_heads): Lookup the "equivalence head"s of given nodes.
        (vn_tracking_edge): Extracted utility function.
        (init_vn_nary_op_from_stmt): Insert and lookup by "equivalence head"s.
        (vn_nary_op_insert_into): Insert new value at the front.
        (vn_nary_op_insert_pieces_predicated_1): Insert as predicated values
        from pieces.
        (simplify_nary_op): Try to simplify N-ary expressions.
        (push_new_nary_ref): Insert a back-reference to vn_nary_op_t.
        (vn_nary_op_insert_pieces_predicated): Insert by "equivalence head"s.
        (is_relaxed_cond_code): Whether operation code is a relaxed condition
        code derived from original code.
        (branch_may_come_from_another): Whether there's a path from the one
        true or false destination to another.
        (val_equiv_insert): Record temporary equivalence.
        (record_equiv_from_previous_cond): Record equivalences generated by
        previous conditions on current BB's true and false edges.
        (find_predicated_value_by_equivs): Find predicated value by the
        operands' equivalences.
        (record_dom_to_phi_bb): Record mappings from immediate dominator to
        basic_block with PHIs.
        (vn_phi_insert): Record mapping from the immediate dominator to a PHI
        node.
        (visit_nary_op): Add lookup previous results of N-ary operations by
        equivalences.
        (free_rpo_vn): Free dominator_to_phi_map.
        (process_bb): Add lookup predicated values by equivalences.
        (struct unwind_state): Unwind state of back-refs to vn_nary_op_t.
        (do_unwind): Unwind the back-refs to vn_nary_op_t.
        (do_rpo_vn): Update back-reference unwind state.
        * tree-ssa-sccvn.h (struct nary_ref): hold a lists of references to the
        nary map entries.

gcc/testsuite/ChangeLog:

        * gcc.dg/tree-ssa/pr68619-2.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-1.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-2.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-3.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-5.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-7.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-8.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-9.c: Disable fre.
        * gcc.dg/tree-ssa/vrp03.c: Disable fre.
        * gcc.dg/tree-ssa/ssa-fre-95.c: New test.
        * gcc.dg/tree-ssa/ssa-fre-96.c: New test.
        * gcc.dg/tree-ssa/ssa-fre-97.c: New test.

Attachment: v2-tree-optimization-101186.patch
Description: v2-tree-optimization-101186.patch

Reply via email to