On Sun, Jul 18, 2021 at 9:25 PM Di Zhao OS <diz...@os.amperecomputing.com> wrote: > > > I tried to improve the patch following your advices and to catch more > opportunities. Hope it'll be helpful.
Sorry for the late reply. > On 6/24/21 8:29 AM, Richard Biener wrote: > > On Thu, Jun 24, 2021 at 11:55 AM Di Zhao via Gcc-patches <gcc- > > patc...@gcc.gnu.org> wrote: > > > > I have some reservations about extending the ad-hoc "predicated value" code. > > > > Some comments on the patch: > > > > +/* hashtable & helpers to record equivalences at given bb. */ > > + > > +typedef struct val_equiv_s > > +{ > > + val_equiv_s *next; > > + val_equiv_s *unwind_to; > > + hashval_t hashcode; > > + /* SSA name this val_equiv_s is associated with. */ > > + tree name; > > + /* RESULT in a vn_pval entry is SSA name of a equivalence. */ > > + vn_pval *values; > > +} * val_equiv_t; > > > > all of this (and using a hashtable for recording) is IMHO a bit overkill. > > Since you only ever record equivalences for values the more natural place to > > hook those in is the vn_ssa_aux structure where we also record the > > availability > > chain. > > I tried to store the equivalences in the vn_ssa_aux structure, but I didn't > optimize the second case successfully: I need to record the equivalence > of a PHI expression's result and arguments, but their value number results > will > become VARYING first, so they won't be changed. Maybe I'm missing something, > or > can I force change a VARYING result? But VARYING still has a value-number - it's the result itself? > Besides, the way currently used, equivalences only need to be "predictable" > rather than available, maybe availability chains do not represent them very > well? Sure, they are a different beast - I'm only commenting on the place you store them as being not too efficient. > > There's little commentary in the new code, in particular function-level > > comments are missing everywhere. > > Added more comments. > > > There's complexity issues, like I see val_equiv_insert has a "recurse" > > feature but also find_predicated_value_by_equiv is quadratic in the number > > of > > equivalences of the lhs/rhs. Without knowing what the recursion on the > > former is for - nothing tells me - I suspect either of both should be > > redundant. > > The intention was, given {A==B, B==C, X==Y, Y==Z} and a previous result of > "C opcode Z", to find the result of "A opcode Y". I removed the "recurse" > feature and modified the searching logic so solve the issue. Now a temporary > hash_set is used to record the equivalences that are visited when searching. OK, so you're covering transitivity at query time - that looks expensive to me. As said I wonder if there's a more efficient way to store equivalences here. > > You seem to record equivalences at possible use points which looks odd at > > best > > - I'd expected equivalences being recorded at the same point we record > > predicated values and for the current condition, not the one determining > > some > > other predication. > > What was the motivation to do it the way you do it? > > The purpose is to "bring down" what can be known from a previous basic-block > that effectively dominates current block, but not actually does so (in the > example it is because jump threading is hindered by a loop). For example in > this case: > > if (a != 0) > // Nothing useful can be recorded here, because this BB doesn't dominate > // the BB that we want to simplify. > c = b; > for (unsigned i = 0; i < c; i++) > { > if (a != 0) // The recording is triggered here. > { > // c == b will be recorded here, so it can be used for > simplification. > // In gimple it is the equivalence of a PHI's result and argument. > if (i >= b) > foo (); > > These requires finding a previous condition that is identical with current > one, so it is convenient to do this in FRE. Besides, as FRE records derived > predicate, so for relative conditions there also might be opportunities for > optimization. In the new patch code this is included. I still don't quite understand why you cannot record the c = b equivalence when processing its block. You'd record "c == b if a != 0" and later you look for equivalences on b and see if they are valid at the use site? That's how predicated values work. I'd like to see this equivalence stuff more naturally integrated with the value lattice, not a collection of bolted-on hashmaps. > Besides, to find more opportunities, added a hashmap to store mappings from > immediate dominators to basic-blocks with PHIs of interest. > > > Why is the code conditional on 'iterate'? > > I haven't worked it out to fit the non-iterate mode, so it now breaks the > if_conversion pass. I think this is because some of the equivalence-recordings > are too optimistic for non-iterate mode. Huh. The non-iterative mode should be easier to deal with, in fact if you are running into correctness issues this hints at latent issues even with the iterative mode. > > You've probably realized that there's no "nice" way to handle temporary > > equivalences in a VN scheme using hashing for expressions (unless you > > degrade > > hashes a lot). > > I modified the code to use TREE_HASH on ssa names. Would that be better? > > > You quote opportunities that are catched with this like > > > > + if (a != 0) > > + { > > + c = b; > > + } > > + for (unsigned i = 0; i < c; i++) > > + { > > + if (a != 0) > > + { > > + if (i >= b) > > + /* Should be eliminated. > > + */ > > + foo (); > > > > but say other "obvious" cases like > > > > + if (a != 0) > > + { > > + c = b; > > + } > > + for (unsigned i = 0; i < c; i++) > > + { > > + if (a != 0) > > + { > > + /* Should be zero. */ > > return b - c; > > > > are not handled. That's in line with the current "value predication" > > which mainly aims at catching simple jump threading opportunities; you only > > simplify conditions with the recorded equivalences. But then the > > complexity of > > handling equivalences does probably not outweight the opportunities catched > > - > > can you share some numbers on how many branches are newly known taken > > during VN with this patch during say bootstrap or build of SPEC CPU? > > I extended the code a little to cover the cases like "A - A" and "A xor A". > > Here are some results on one bootstrap step: > | values found | more bb removed | values found in all > | in fre1 | at fre1 | fre & pre passes > ----------------------------------------------------------------- > bootstrap | 592 | 40 | 1272 > > As the code is different for bootstrap, the "more bb removed" metric is not > precise. I also tested on SPEC CPU 2017 integer cases: > | values found | more bb removed | values found in all > | in fre1 | at fre1 | fre & pre passes > ----------------------------------------------------------------- > 500.perlbench_r| 3 | 0 | 9 > 502.gcc_r | 25 | 39 | 241 > 520.omnetpp | 9 | 6 | 34 > 523.xalancbmk_r| 12 | 0 | 35 > 541.leela_r | 2 | 0 | 2 > > In cases not listed above there's no value found by equivalences. Benefits > after fre1 are not counted as CGF may be different from here (for > 523.xalancbmk_r there're 8 more basic-blocks removed at fre3). Although the > chances are not plenty, there might be potential benefits, such as making a > function pure. That's very little improvements for this large pice of code :/ Richard. > > I've hoped to ditch the current "value predication" code by eventually > > using the > > relation oracle from ranger but did not yet have the chance to look into > > that. > > Now, the predicated equivalences are likely not something that > > infrastructure > > can handle? > > > > In the end I think we should research into maintaining an alternate > > expression > > table for conditions (those we like to simplify with > > equivalences) and use a data structure that more easily allows to introduce > > (temporary) equivalences. Like by maintaining back-references of values > > we've > > recorded a condition for and a way to quickly re-canonicalize conditions. > > Well - > > it requires some research, as said. > > > > Richard. > > > > > Regards, > > > Di Zhao > > > > > > Extend FRE with an "equivalence map" for condition prediction. > > > > > > 2021-06-24 Di Zhao <diz...@os.amperecomputing.com> > > > > > Thanks, > Di Zhao > > -------- > Extend FRE with an "equivalence map" for condition prediction. > > 2021-07-18 Di Zhao <diz...@os.amperecomputing.com> > > gcc/ChangeLog: > PR tree-optimization/101186 > * tree-ssa-sccvn.c (vn_tracking_edge): Extracted utility function. > (dominated_by_p_w_unex): Moved upward, no change. > (vn_nary_op_get_predicated_value): Moved upward, no change. > (struct val_equiv_hasher): Hasher for the "equivalence map". > (is_vn_valid_at_bb): Check if vn_pval is valid at BB. > (val_equiv_insert): Insert into "equivalence map". > (vn_lookup_binary_op_result): Lookup binary expression's result by VN. > (iterate_val_equivs): Iterate on equivalences and returns a non-NULL > result returned by callback. > (find_predicated_binary_by_lhs_equiv): Callback for > iterate_val_equivs. > Lookup a binary operations result by LHS equivalences. > (find_predicated_binary_by_rhs_equiv): Callback for > iterate_val_equivs. > Lookup a binary operations result by RHS equivalences. > (find_predicated_binary_by_equiv): Lookup predicated value of a binary > operation by equivalences. > (is_relaxed_cond_code): Whether operation code is a relaxed condition > code derived from original code. > (branch_may_come_from_another): Whether there's a path from the one > true or false destination to another. > (record_equiv_from_previous_edge): Record equivalence relation from a > previous condition on current bb' true and false edges. > (record_equiv_from_previous_cond): Record equivalences generated by > previous conditions on current BB's true and false edges. > (vn_nary_op_insert_pieces_predicated): Extract utility function. > Insert > into the "equivalence map" for predicate like "x==y is true". > (record_dom_to_phi_bb): Record mappings from immediate dominator to > basic_block with PHIs. > (vn_phi_lookup): Record mappings from immediate dominator to PHIs. > (visit_nary_op): Add lookup predicated values of binaries by > equivalences. > (free_rpo_vn): Free the "equivalence map". > (process_bb): Insert into & lookup from the "equivalence map". > (struct unwind_state): Add "equivalence map" unwind state. > (do_unwind): Unwind the "equivalence map". > (do_rpo_vn): Update "equivalence map" unwind state. > > gcc/testsuite/ChangeLog: > PR tree-optimization/101186 > * gcc.dg/tree-ssa/pr71947-1.c: Disable fre. > * gcc.dg/tree-ssa/pr71947-2.c: Disable fre. > * gcc.dg/tree-ssa/pr71947-3.c: Disable fre. > * gcc.dg/tree-ssa/pr71947-5.c: Disable fre. > * gcc.dg/tree-ssa/vrp03.c: Disable fre. > * gcc.dg/tree-ssa/ssa-fre-95.c: New test. > * gcc.dg/tree-ssa/ssa-fre-96.c: New test.