Thanks, Di
-----Original Message----- From: Gcc-patches <gcc-patches-bounces+dizhao=os.amperecomputing....@gcc.gnu.org> On Behalf Of Di Zhao OS via Gcc-patches Sent: Friday, September 17, 2021 2:13 AM To: gcc-patches@gcc.gnu.org Subject: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction Sorry about updating on this after so long. It took me much time to work out a new plan and pass the tests. The new idea is to use one variable to represent a set of equal variables at some basic-block. This variable is called a "equivalence head" or "equiv-head" in the code. (There's no-longer a "equivalence map".) - Initially an SSA_NAME's "equivalence head" is its value number. Temporary equivalence heads are recorded as unary NOP_EXPR results in the vn_nary_op_t map. Besides, when inserting into vn_nary_op_t map, make the new result at front of the vn_pval list, so that when searching for a variable's equivalence head, the first result represents the largest equivalence set at current location. - In vn_ssa_aux_t, maintain a list of references to valid_info->nary entry. For recorded equivalences, the reference is result->entry; for normal N-ary operations, the reference is operand->entry. - When recording equivalences, if one side A is constant or has more refs, make it the new equivalence head of the other side B. Traverse B's ref-list, if a variable C's previous equiv-head is B, update to A. And re-insert B's n-ary operations by replacing B with A. - When inserting and looking for the results of n-ary operations, insert and lookup by the operands' equiv-heads. So except for the refs in vn_ssa_aux_t, this scheme uses the original infrastructure to its best. Quadric search time is avoided at the cost of some re-insertions. Test results on SPEC2017 intrate (counts and percentages): |more bb |more bb |more stmt|more stmt|more |more |removed |removed |removed |removed |nv_nary_ops|nv_nary_ops |at fre1 |at fre1 |at fre1 |at fre1 |inserted |inserted ------------------------------------------------------------------------------ 500.perlbench_r| 64 | 1.98% | 103 | 0.19% | 11260 | 12.16% 502.gcc_r | 671 | 4.80% | 317 | 0.23% | 13964 | 6.09% 505.mcf_r | 5 | 35.71% | 9 | 1.40% | 32 | 2.52% 520.omnetpp | 132 | 5.45% | 39 | 0.11% | 1895 | 3.91% 523.xalancbmk_r| 238 | 3.26% | 313 | 0.36% | 1417 | 1.27% 525.x264_r | 4 | 1.36% | 27 | 0.11% | 1752 | 6.78% 531.deepsjeng_r| 1 | 3.45% | 2 | 0.14% | 228 | 8.67% 541.leela_r | 2 | 0.63% | 0 | 0.00% | 92 | 1.14% 548.exchange2_r| 0 | 0.00% | 3 | 0.04% | 49 | 1.03% 557.xz_r | 0 | 0.00% | 3 | 0.07% | 272 | 7.55% There're more basic_blocks and statements removed compared with last implementation, the reasons are: 1) "CONST op CONST" simplification is included. It is missed in previous patch. 2) By inserting RHS of statements on equiv-heads, more N-ary operations can be simplified. One example is in 'ssa-fre-97.c' in the patch file. While jump-threading & vrp also utilize temporary equivalences (so some of the newly removed blocks and statements can also be covered by them), I think this patch is a supplement, in cases when jump threading cannot take place (the original example), or value number info needs to be involved (the 'ssa-fre-97.c' example). Fixed the former issue with non-iterate mode. About recording the temporary equivalences generated by PHIs (i.e. the 'record_equiv_from_previous_cond' stuff), I have to admit it looks strange and the code size is large, but I haven't find a better way yet. Consider a piece of CFG like the one below, if we want to record x==x2 on the true edge when processing bb1, the location (following current practice) will be bb2. But that is not useful at bb5 or bb6, because bb2 doesn't dominate them. And I can't find a place to record x==x1 when processing bb1. If we can record things on edges rather than blocks, say x==x1 on 1->3 and x==x2 on 1->2, then perhaps with an extra check for "a!=0", x2 can be a valid equiv-head for x since bb5. But I think it lacks efficiency and is not persuasive. It is more efficient to find a valid previous predicate when processing bb4, because the vn_nary_op_t will be fetched anyway. -------------- | if (a != 0) | bb1 -------------- f | \ t | ------- | | bb2 | | ------- | / ------------------------- | x = PHI<x1(1), x2(2)> | bb3 ------------------------- | .... | -------------- | if (a != 0) | bb4 -------------- |f \t ------------- ------- bb7 | where | | bb5 | ==> where "x==x2" is recorded now | "x==x1" is| ------- | recorded | \ | now | ... ------------- | ------- | bb6 | ==> where "x==x2" needs to be used ------- Although I think I can remove the 'dominator_to_phi_map' and generalize this a little, but the major logic will be similar. So I left this unchanged for now, and would like to hear your suggestions on the new scheme overall. Thanks, Di Zhao -------- Extend FRE with temporary equivalences. 2021-09-16 Di Zhao <diz...@os.amperecomputing.com> gcc/ChangeLog: PR tree-optimization/101186 * tree-ssa-sccvn.c (dominated_by_p_w_unex): Moved upward, no change. (vn_nary_op_get_predicated_value): Moved upward, no change. (is_vn_valid_at_bb): Check if vn_pval is valid at BB. (lookup_equiv_head): Lookup the "equivalence head" of given node. (lookup_equiv_heads): Lookup the "equivalence head"s of given nodes. (vn_tracking_edge): Extracted utility function. (init_vn_nary_op_from_stmt): Insert and lookup by "equivalence head"s. (vn_nary_op_insert_into): Insert new value at the front. (vn_nary_op_insert_pieces_predicated_1): Insert as predicated values from pieces. (simplify_nary_op): Try to simplify N-ary expressions. (push_new_nary_ref): Insert a back-reference to vn_nary_op_t. (vn_nary_op_insert_pieces_predicated): Insert by "equivalence head"s. (is_relaxed_cond_code): Whether operation code is a relaxed condition code derived from original code. (branch_may_come_from_another): Whether there's a path from the one true or false destination to another. (val_equiv_insert): Record temporary equivalence. (record_equiv_from_previous_cond): Record equivalences generated by previous conditions on current BB's true and false edges. (find_predicated_value_by_equivs): Find predicated value by the operands' equivalences. (record_dom_to_phi_bb): Record mappings from immediate dominator to basic_block with PHIs. (vn_phi_insert): Record mapping from the immediate dominator to a PHI node. (visit_nary_op): Add lookup previous results of N-ary operations by equivalences. (free_rpo_vn): Free dominator_to_phi_map. (process_bb): Add lookup predicated values by equivalences. (struct unwind_state): Unwind state of back-refs to vn_nary_op_t. (do_unwind): Unwind the back-refs to vn_nary_op_t. (do_rpo_vn): Update back-reference unwind state. * tree-ssa-sccvn.h (struct nary_ref): hold a lists of references to the nary map entries. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr68619-2.c: Disable fre. * gcc.dg/tree-ssa/pr71947-1.c: Disable fre. * gcc.dg/tree-ssa/pr71947-2.c: Disable fre. * gcc.dg/tree-ssa/pr71947-3.c: Disable fre. * gcc.dg/tree-ssa/pr71947-5.c: Disable fre. * gcc.dg/tree-ssa/pr71947-7.c: Disable fre. * gcc.dg/tree-ssa/pr71947-8.c: Disable fre. * gcc.dg/tree-ssa/pr71947-9.c: Disable fre. * gcc.dg/tree-ssa/vrp03.c: Disable fre. * gcc.dg/tree-ssa/ssa-fre-95.c: New test. * gcc.dg/tree-ssa/ssa-fre-96.c: New test. * gcc.dg/tree-ssa/ssa-fre-97.c: New test.
v2-tree-optimization-101186.patch
Description: v2-tree-optimization-101186.patch