On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Hi All, > > Here is the second patch related to extended predication. > Few comments which explain a main goal of design. > > 1. I don't want to insert any critical edge splitting since it may > lead to less efficient binaries. > 2. One special case of extended PHI node predication was introduced > when #arguments is more than 2 but only two arguments are different > and one argument has the only occurrence. For such PHI conditional > scalar reduction is applied. > This is correspondent to the following statement: > if (q1 && q2 && q3) var++ > New function phi_has_two_different_args was introduced to detect such phi. > 3. Original algorithm for PHI predication used assumption that at > least one incoming edge for blocks containing PHI is not critical - it > guarantees that all computations related to predicate of normal edge > are already inserted above this block and > code related to PHI predication can be inserted at the beginning of > block. But this is not true for critical edges for which predicate > computations are in the block where code for phi predication must be > inserted. So new function find_insertion_point is introduced which is > simply found out the last statement in block defining predicates > correspondent to all incoming edges and insert phi predication code > after it (with some minor exceptions).
Unfortunately the patch doesn't apply for me - I get patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@ predicate_all_scalar_phis (struct loop *loop) a few remarks nevertheless. I don't see how we need both predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi. Couldn't we simply sort an array of (edge, value) pairs after value and handle equal values specially in predicate_extended_scalar_phi? That would even make PHI <a, a, b, c, c> more optimal. I don't understand the need for find_insertion_point. All SSA names required for the predicates are defined upward - and the complex CFG is squashed to a single basic-block, thus the defs will dominate the inserted code if you insert after labels just like for the other case. Or what am I missing? ("flattening" of the basic-blocks of course needs to happen in dominator order - but I guess that happens already?) I'd like the extended PHI handling to be enablable by a flag even for !force-vectorization - I've seen cases with 3 PHI args multiple times that would have been nice to vectorize. I suggest to add -ftree-loop-if-convert-aggressive for this. We can do this as followup, but please rename the local flag_force_vectorize flag to something less looking like a flag, like simply 'aggressive'. Otherwise patch 2 looks ok to me. Richard. > ChangeLog: > > 2014-10-24 Yuri Rumyantsev <ysrum...@gmail.com> > > * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use > FLAG_FORCE_VECTORIZE instead of loop flag. > (if_convertible_bb_p): Allow bb has more than 2 predecessors if > FLAG_FORCE_VECTORIZE is true. > (if_convertible_bb_p): Delete check that bb has at least one > non-critical incoming edge. > (phi_has_two_different_args): New function. > (is_cond_scalar_reduction): Add argument EXTENDED to choose access > to phi arguments. Invoke phi_has_two_different_args to get phi > arguments if EXTENDED is true. Change check that block > containing reduction statement candidate is predecessor > of phi-block since phi may have more than two arguments. > (convert_scalar_cond_reduction): Add argument BEFORE to insert > statement before/after gsi point. > (predicate_scalar_phi): Add argument false (which means non-extended > predication) to call of is_cond_scalar_reduction. Add argument > true (which correspondent to argument BEFORE) to call of > convert_scalar_cond_reduction. > (get_predicate_for_edge): New function. > (predicate_arbitrary_scalar_phi): New function. > (predicate_extended_scalar_phi): New function. > (find_insertion_point): New function. > (predicate_all_scalar_phis): Add two boolean variables EXTENDED and > BEFORE. Initialize EXTENDED to true if BB containing phi has more > than 2 predecessors or both incoming edges are critical. Invoke > find_phi_replacement_condition and predicate_scalar_phi or > find_insertion_point and predicate_extended_scalar_phi depending on > EXTENDED value. > (insert_gimplified_predicates): Add check that non-predicated block > may have statements to insert. Insert predicate of BB just after label > if FLAG_FORCE_VECTORIZE is true. > (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which > is copy of inner or outer loop field force_vectorize.