Richard, Here is updated patch (part1) for extended if conversion.
Second part of patch will be sent later. Changelog. 2014-10-13 Yuri Rumyantsev <ysrum...@gmail.com> * tree-if-conv.c (cgraph.h): Add include file to detect function clone. (flag_force_vectorize): New variable. (edge_predicate): New function. (set_edge_predicate): New function. (add_to_predicate_list): Check unconditionally that bb is always executed to early exit. Use predicate of cd-equivalent block for join blocks if it exists. (add_to_dst_predicate_list): Invoke add_to_predicate_list if destination block of edge is not always executed. Set-up predicate for critical edge. (if_convertible_phi_p): Accept phi nodes with more than two args if FLAG_FORCE_VECTORIZE was set-up. (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. (if_convertible_stmt_p): Fix up pre-function comments. (all_edges_are_critical): New function. (if_convertible_bb_p): Allow bb has more than two predecessors if FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical to reject block if-conversion with incoming critical edges only if FLAG_FORCE_VECTORIZE was not set-up. (predicate_bbs): Skip loop exit block also. Add check that if fold_build2 produces bool conversion, recompute predicate using build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE. (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's. (find_phi_replacement_condition): Extend function interface: it returns NULL if given phi node must be handled by means of extended phi node predication. If number of predecessors of phi-block is equal 2 and atleast one incoming edge is not critical original algorithm is used. (get_predicate_for_edge): New function. (find_insertion_point): New function. (predicate_arbitrary_scalar_phi): New function. (predicate_all_scalar_phis): Introduce new variable BEFORE. Invoke find_insertion_point to initialize gsi and predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals that extended predication must be applied). (insert_gimplified_predicates): Add test for non-predicated basic blocks that there are no gimplified statements to insert. Insert predicates at the block begining for extended if-conversion. (tree_if_conversion): Initialize flag_force_vectorize from current loop or outer loop (to support pragma omp declare).Do loop versioning for innermost loop marked with pragma omp simd and FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges for blocks with two successors. 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrum...@gmail.com>: > Richard, > > here is reduced patch (part.1) which was reduced almost twice. > Let's me also answer on your comments. > > 1. I really use edge field 'aux' to keep predicate for critical edges. > My previous code was not correct and now it looks like: > > if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1) > /* Edge E is not critical, use predicate of edge source bb. */ > c = bb_predicate (b); > else > /* Edge E is critical and its aux field contains predicate. */ > c = edge_predicate (e); > > 2. I completely delete all code related to creation of conditional > expressions and completely rely on bool pattern recognition in > vectorizer. But we need to delete all dead predicate computations > which are not used since they prevent vectorization. I will add this > local-dce function in next patch. > 3. I also did not include in this patch recognition of general > phi-nodes with two arguments only for which conversion of conditional > scalar reduction can be applied also. > Note that all these changes are applied for loop marked with pragma > omp simd only. > > 2014-09-22 Yuri Rumyantsev <ysrum...@gmail.com> > > * tree-if-conv.c (cgraph.h): Add include file to detect function clone. > (flag_force_vectorize): New variable. > (edge_predicate): New function. > (set_edge_predicate): New function. > (convert_name_to_cmp): New function. > (add_to_predicate_list): Check unconditionally that bb is always > executed to early exit. Use predicate of cd-equivalent block > for join blocks if it exists. > (add_to_dst_predicate_list): Invoke add_to_predicate_list if > destination block of edge is not always executed. Set-up predicate > for critical edge. > (if_convertible_phi_p): Accept phi nodes with more than two args > if FLAG_FORCE_VECTORIZE was set-up. > (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. > (if_convertible_stmt_p): Fix up pre-function comments. > (all_edges_are_critical): New function. > (if_convertible_bb_p): Allow bb has more than two predecessors if > FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical > to reject block if-conversion with incoming critical edges only if > FLAG_FORCE_VECTORIZE was not set-up. > (predicate_bbs): Skip loop exit block also. Add check that if > fold_build2 produces bool conversion, recompute predicate using > build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE. > (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if > FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's. > (find_phi_replacement_condition): Extend function interface: > it returns NULL if given phi node must be handled by means of > extended phi node predication. If number of predecessors of phi-block > is equal 2 and atleast one incoming edge is not critical original > algorithm is used. > (get_predicate_for_edge): New function. > (find_insertion_point): New function. > (predicate_arbitrary_scalar_phi): New function. > (predicate_all_scalar_phis): Introduce new variable BEFORE. > Invoke find_insertion_point to initialize gsi and > predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals > that extended predication must be applied). > (insert_gimplified_predicates): Add test for non-predicated basic > blocks that there are no gimplified statements to insert. Insert > predicates at the block begining for extended if-conversion. > (tree_if_conversion): Initialize flag_force_vectorize from current > loop or outer loop (to support pragma omp declare).Do loop versioning > for innermost loop marked with pragma omp simd and > FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges > for blocks with two successors. > > > > > 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Richard! >>> Here is updated patch with the following changes: >>> >>> 1. Any restrictions on phi-function were eliminated for extended conversion. >>> 2. Put predicate for critical edges to 'aux' field of edge, i.e. >>> negate_predicate was deleted. >>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can >>> be critical. >>> 4. Use notion of cd-equivalence to set-up predicate for join basic >>> blocks to simplify it. >>> 5. I decided to not design pre-pass since it will lead generating >>> chain of cond expressions for phi-node if conversion, whereas for phi >>> of kind >>> x = PHI <1(2), 1(3), 2(4)> >>> only one cond expression is required and this is considered as simple >>> optimization for arbitrary phi-function. More precise, >>> if phi-function have only two different arguments and one of them has >>> single occurrence, if- conversion is performed as if phi have only 2 >>> arguments. >>> For arbitrary phi function a chain of cond expressions is produced. >>> >>> Updated patch is attached. >>> >>> Any comments will be appreciated. >> >> The patch is still very big and does multiple things at once which makes >> it hard to review. >> >> In addition to that it changes function singatures without updating >> the function comments. For example what is the convert_bool >> argument doing to add_to_dst_predicate_list? Why do we need >> all this added logic. >> >> You duplicate operand_equal_for_phi_arg_p. >> >> I think the code handling PHIs with more than two operands but >> only two unequal operands is useful generally, so that's an obvious >> candidate for splitting out into a separate patch. >> >> + CONVERT_BOOL argument was added to convert bool predicate computations >> + which is not supported by vectorizer to int type through creating of >> + conditional expressions. */ >> >> Example? The vectorizer has patterns for bool predicate computations. >> This seems to be another feature that needs splitting out. >> >> The way you get around the critical edge parts looks awkward to me. >> Please either do _all_ predicates as edge predicates or simply >> split critical edges (of the respective loop body). >> >> I still think that an utility doing same PHI arg merging by introducing >> forwarder blocks would be nicer to have. >> >> I'd restructure the main tree_if_conversion function to apply these >> CFG pre-transforms when we are going to version the loop >> for if conversion (eventually transitioning to always doing that). >> >> So - please split up the patch. It's way too big. >> >> Thanks, >> Richard. >> >>> 2014-08-15 Yuri Rumyantsev <ysrum...@gmail.com> >>> >>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone. >>> (flag_force_vectorize): New variable. >>> (edge_predicate): New function. >>> (set_edge_predicate): New function. >>> (add_stmt_to_bb_predicate_gimplified_stmts): New function. >>> (init_bb_predicate): Add initialization of negate_predicate field. >>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE. >>> (convert_name_to_cmp): New function. >>> (get_type_for_cond): New function. >>> (convert_bool_predicate): New function. >>> (predicate_disjunction): New function. >>> (predicate_conjunction): New function. >>> (add_to_predicate_list): Add convert_bool argument. >>> Use predicate of cd-equivalent block if convert_bool is true and >>> such bb exists; save it in static variable for further possible use. >>> Add call of predicate_disjunction if convert_bool argument is true. >>> (add_to_dst_predicate_list): Add convert_bool argument. >>> Add early function exit if edge target block is always executed. >>> Add call of predicate_conjunction if convert_bool argument is true. >>> Pass convert_bool argument for add_to_predicate_list. >>> Set-up predicate for crritical edge if convert_bool is true. >>> (equal_phi_args): New function. >>> (phi_has_two_different_args): New function. >>> (if_convertible_phi_p): Accept phi nodes with more than two args >>> if flag_force_vectorize wa set-up. >>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize. >>> (if_convertible_stmt_p): Allow calls of function clones if >>> flag_force_vectorize was set-up. >>> (all_edges_are_critical): New function. >>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>> flag_force_vectorize was set-up. Use call of all_edges_are_critical >>> to reject block if-conversion with imcoming critical edges only if >>> flag_force_vectorize was not set-up. >>> (walk_cond_tree): New function. >>> (vect_bool_pattern_is_applicable): New function. >>> (predicate_bbs): Add convert_bool argument which is used to transform >>> comparison expressions of boolean type into conditional expressions >>> with integral operands. If convert_bool argument was set-up and >>> vect bool pattern can be appied perform the following transformation: >>> (bool) x != 0 --> y = (int) x; x != 0; >>> Add check that if fold_build2 produces bool conversion if convert_bool >>> was set-up, recompute predicate using build2_loc. Additional argument >>> 'convert_bool" is passed to add_to_dst_predicate_list and >>> add_to_predicate_list. >>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if >>> flag_force_vectorize was set-up to calculate cd equivalent bb's. >>> Call predicate_bbs with additional argument equal to false. >>> (find_phi_replacement_condition): Extend function interface: >>> it returns NULL if given phi node must be handled by means of >>> extended phi node predication. If number of predecessors of phi-block >>> is equal 2 and atleast one incoming edge is not critical original >>> algorithm is used. >>> (is_cond_scalar_reduction): Add 'extended' argument which signals that >>> phi arguments must be evaluated through phi_has_two_different_args. >>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond >>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction. >>> (get_predicate_for_edge): New function. >>> (find_insertion_point): New function. >>> (predicate_arbitrary_phi): New function. >>> (predicate_extended_scalar_phi): New function. >>> (predicate_all_scalar_phis): Add code to set-up gimple statement >>> iterator for predication of extended scalar phi's for insertion. >>> (insert_gimplified_predicates): Add test for non-predicated basic >>> blocks that there are no gimplified statements to insert. Insert >>> predicates at the block begining for extended if-conversion. >>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended >>> predication to build mask. >>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs. >>> (tree_if_conversion): Initialize flag_force_vectorize from current >>> loop or outer loop (to support pragma omp declare).Do loop versioning >>> for innermost loop marked with pragma omp simd. >>> >>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>> wrote: >>>>> Hi All, >>>>> >>>>> We implemented additional support for pragma omp simd in part of >>>>> extended if-conversion loops with such pragma. These extensions >>>>> include: >>>>> >>>>> 1. All extensions are performed only if considered loop or its outer >>>>> loop was marked with pragma omp simd (force_vectorize); For ordinary >>>>> loops behavior was not changed. >>>>> 2. Took off cfg restriction on basic block which can have more than 2 >>>>> predecessors. >>>>> 3. Put additional restriction on phi nodes which was missed in current >>>>> design: >>>>> all phi nodes must be in non-predicated basic block to conform >>>>> semantic of COND_EXPR which is used for transformation. >>>> >>>> How is that so? If the PHI is predicated then its result will be used >>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs. >>>> >>>> No? >>>> >>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments >>>>> with some limitations: >>>>> - for phi nodes which have more than 2 arguments, but only two >>>>> arguments are different and one of them has the only occurence, >>>>> transformation to single COND_EXPR can be done. >>>>> - if phi node has more different arguments and all edge predicates >>>>> correspondent to phi-arguments are disjoint, a chain of COND_EXPR >>>>> will be generated for it. In current design very simple check is used: >>>>> check starting from end that two edges correspondent to neighbor >>>>> arguments have common predecessor which is used for further check >>>>> with next edge. >>>>> These guarantee that phi predication will produce the correct result. >>>> >>>> Btw, you can think of these extensions as unfactoring a PHI node by >>>> inserting forwarder blocks. Thus >>>> >>>> x = PHI <1(2), 1(3), 2(4)> >>>> >>>> becomes >>>> >>>> bb 5: <forwarder-from(2)-and(3)> >>>> >>>> x = PHI <1(5), 2(4)> >>>> >>>> and >>>> >>>> x = PHI <1(2), 2(3), 3(4)> >>>> >>>> becomes >>>> >>>> bb 5: >>>> x' = PHI <1(2), 2(3)> >>>> >>>> b = PHI<x'(5), 3(4)> >>>> >>>> which means that 3) has to work. Note that we want this kind of >>>> PHI transforms for out-of-SSA as well to reduce the number of >>>> copies we need to insert on edges. >>>> >>>> Thus it would be nice if you implemented 4) in terms of a pre-pass >>>> over the force_vect loops PHI nodes, applying that CFG transform. >>>> And make 3) work properly if it doesn't already. >>>> >>>> It looks like you introduce a "negate predicate" to work around the >>>> critical edge limitation? Please instead change if-conversion to >>>> work with edge predicates (as opposed to BB predicates). >>>> >>>> Thanks, >>>> Richard. >>>> >>>>> >>>>> Here is example of such extended predication (compile with >>>>> -march=core-avx2): >>>>> #pragma omp simd safelen(8) >>>>> for (i=0; i<512; i++) >>>>> { >>>>> float t = a[i]; >>>>> if (t > 0 & t < 1.0e+17f) >>>>> if (c[i] != 0) >>>>> res += 1; >>>>> } >>>>> <bb 4>: >>>>> # res_15 = PHI <res_1(5), 0(3)> >>>>> # i_16 = PHI <i_11(5), 0(3)> >>>>> # ivtmp_17 = PHI <ivtmp_14(5), 512(3)> >>>>> t_5 = a[i_16]; >>>>> _6 = t_5 > 0.0; >>>>> _7 = t_5 < 9.9999998430674944e+16; >>>>> _8 = _7 & _6; >>>>> _ifc__28 = (unsigned int) _8; >>>>> _10 = &c[i_16]; >>>>> _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0; >>>>> _9 = MASK_LOAD (_10, 0B, _ifc__36); >>>>> _ifc__29 = _ifc__28 != 0 ? 1 : 0; >>>>> _ifc__30 = (int) _ifc__29; >>>>> _ifc__31 = _9 != 0 ? _ifc__30 : 0; >>>>> _ifc__32 = _ifc__28 != 0 ? 1 : 0; >>>>> _ifc__33 = (int) _ifc__32; >>>>> _ifc__34 = _9 == 0 ? _ifc__33 : 0; >>>>> _ifc__35 = _ifc__31 != 0 ? 1 : 0; >>>>> res_1 = res_15 + _ifc__35; >>>>> i_11 = i_16 + 1; >>>>> ivtmp_14 = ivtmp_17 - 1; >>>>> if (ivtmp_14 != 0) >>>>> goto <bb 4>; >>>>> >>>>> Bootstrap and regression testing did not show any new failures. >>>>> >>>>> gcc/ChageLog >>>>> >>>>> 2014-06-25 Yuri Rumyantsev <ysrum...@gmail.com> >>>>> >>>>> * tree-if-conv.c (flag_force_vectorize): New variable. >>>>> (struct bb_predicate_s): Add negate_predicate field. >>>>> (bb_negate_predicate): New function. >>>>> (set_bb_negate_predicate): New function. >>>>> (bb_copy_predicate): New function. >>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function. >>>>> (init_bb_predicate): Add initialization of negate_predicate field. >>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE. >>>>> (convert_name_to_cmp): New function. >>>>> (get_type_for_cond): New function. >>>>> (convert_bool_predicate): New function. >>>>> (predicate_disjunction): New function. >>>>> (predicate_conjunction): New function. >>>>> (add_to_predicate_list): Add convert_bool argument. >>>>> Add call of predicate_disjunction if convert_bool argument is true. >>>>> (add_to_dst_predicate_list): Add convert_bool argument. >>>>> Add early function exit if edge target block is always executed. >>>>> Add call of predicate_conjunction if convert_bool argument is true. >>>>> Pass convert_bool argument for add_to_predicate_list. >>>>> (equal_phi_args): New function. >>>>> (phi_has_two_different_args): New function. >>>>> (phi_args_disjoint): New function. >>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>> for loops marked with pragma omp simd. Add check that phi nodes are >>>>> in non-predicated basic blocks. >>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize. >>>>> (all_edges_are_critical): New function. >>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical >>>>> to reject block if-conversion with imcoming critical edges only if >>>>> flag_force_vectorize was not setup. >>>>> (walk_cond_tree): New function. >>>>> (vect_bool_pattern_is_applicable): New function. >>>>> (predicate_bbs): Add convert_bool argument that is used to transform >>>>> comparison expressions of boolean type into conditional expressions >>>>> with integral operands. If bool_conv argument is false or both >>>>> outgoing edges are not critical old algorithm of predicate assignments >>>>> is used, otherwise the following code was added: check on applicable >>>>> of vect-bool-pattern recognition and trnasformation of >>>>> (bool) x != 0 --> y = (int) x; x != 0; >>>>> compute predicates for both outgoing edges one of which is critical >>>>> one using 'normal' edge, i.e. compute true and false predicates using >>>>> normal outgoing edge only; evaluated predicates are stored in >>>>> predicate and negate_predicate fields of struct bb_predicate_s and >>>>> negate_predicate of normal edge conatins predicate of critical edge, >>>>> but generated gimplified statements are stored in their destination >>>>> block fields. Additional argument 'convert_bool" is passed to >>>>> add_to_dst_predicate_list and add_to_predicate_list. >>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument >>>>> equal to false. >>>>> (find_phi_replacement_condition): Extend function interface: >>>>> it returns NULL if given phi node must be handled by means of >>>>> extended phi node predication. If number of predecessors of phi-block >>>>> is equal 2 and atleast one incoming edge is not critical original >>>>> algorithm is used. >>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that >>>>> both phi arguments must be evaluated through phi_has_two_different_args. >>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond >>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction. >>>>> (get_predicate_for_edge): New function. >>>>> (find_insertion_point): New function. >>>>> (predicate_phi_disjoint_args): New function. >>>>> (predicate_extended_scalar_phi): New function. >>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement >>>>> iterator for predication of extended scalar phi's for insertion. >>>>> (insert_gimplified_predicates): Add test for non-predicated basic >>>>> blocks that there are no gimplified statements to insert. Insert >>>>> predicates at the block begining for extended if-conversion. >>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended >>>>> predication to build mask. >>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs. >>>>> (split_crit_edge): New function. >>>>> (tree_if_conversion): Initialize flag_force_vectorize from current >>>>> loop or outer loop (to support pragma omp declare). Invoke >>>>> split_crit_edge for extended predication. Do loop versioning for >>>>> innermost loop marked with pragma omp simd.
patch.part-1
Description: Binary data