Richard,

here is reduced patch (part.1) which was reduced almost twice.
Let's me also answer on your comments.

1. I really use edge field 'aux' to keep predicate for critical edges.
My previous code was not correct and now it looks like:

  if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
    /* Edge E is not critical,  use predicate of edge source bb. */
    c = bb_predicate (b);
  else
    /* Edge E is critical and its aux field contains predicate.  */
    c = edge_predicate (e);

2. I completely delete all code related to creation of conditional
expressions and completely rely on bool pattern recognition in
vectorizer. But we need to delete all dead predicate computations
which are not used since they prevent vectorization. I will add this
local-dce function in next patch.
3. I also did not include in this patch recognition of general
phi-nodes with two arguments only for which conversion of conditional
scalar reduction can be applied also.
Note that all these changes are applied for loop marked with pragma
omp simd only.

2014-09-22  Yuri Rumyantsev  <ysrum...@gmail.com>

* tree-if-conv.c (cgraph.h): Add include file to detect function clone.
(flag_force_vectorize): New variable.
(edge_predicate): New function.
(set_edge_predicate): New function.
(convert_name_to_cmp): New function.
(add_to_predicate_list): Check unconditionally that bb is always
executed to early exit. Use predicate of cd-equivalent block
for join blocks if it exists.
(add_to_dst_predicate_list): Invoke add_to_predicate_list if
destination block of edge is not always executed. Set-up predicate
for critical edge.
(if_convertible_phi_p): Accept phi nodes with more than two args
if FLAG_FORCE_VECTORIZE was set-up.
(ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
(if_convertible_stmt_p): Fix up pre-function comments.
(all_edges_are_critical): New function.
(if_convertible_bb_p): Allow bb has more than two predecessors if
FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
to reject block if-conversion with incoming critical edges only if
FLAG_FORCE_VECTORIZE was not set-up.
(predicate_bbs): Skip loop exit block also. Add check that if
fold_build2 produces bool conversion, recompute predicate using
build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
(if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
(find_phi_replacement_condition): Extend function interface:
it returns NULL if given phi node must be handled by means of
extended phi node predication. If number of predecessors of phi-block
is equal 2 and atleast one incoming edge is not critical original
algorithm is used.
(get_predicate_for_edge): New function.
(find_insertion_point): New function.
(predicate_arbitrary_scalar_phi): New function.
(predicate_all_scalar_phis): Introduce new variable BEFORE.
Invoke find_insertion_point to initialize gsi and
predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
that extended predication must be applied).
(insert_gimplified_predicates): Add test for non-predicated basic
blocks that there are no gimplified statements to insert. Insert
predicates at the block begining for extended if-conversion.
(tree_if_conversion): Initialize flag_force_vectorize from current
loop or outer loop (to support pragma omp declare).Do loop versioning
for innermost loop marked with pragma omp simd and
FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
for blocks with two successors.




2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guent...@gmail.com>:
> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>> Richard!
>> Here is updated patch with the following changes:
>>
>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>> negate_predicate was deleted.
>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>> be critical.
>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>> blocks to simplify it.
>> 5. I decided to not design pre-pass since it will lead generating
>> chain of cond expressions for phi-node if conversion, whereas for phi
>> of kind
>>   x = PHI <1(2), 1(3), 2(4)>
>> only one cond expression is required and this is considered as simple
>> optimization for arbitrary phi-function. More precise,
>> if phi-function have only two different arguments and one of them has
>> single occurrence, if- conversion is performed as if phi have only 2
>> arguments.
>> For arbitrary phi function a chain of cond expressions is produced.
>>
>> Updated patch is attached.
>>
>> Any comments will be appreciated.
>
> The patch is still very big and does multiple things at once which makes
> it hard to review.
>
> In addition to that it changes function singatures without updating
> the function comments.  For example what is the convert_bool
> argument doing to add_to_dst_predicate_list?  Why do we need
> all this added logic.
>
> You duplicate operand_equal_for_phi_arg_p.
>
> I think the code handling PHIs with more than two operands but
> only two unequal operands is useful generally, so that's an obvious
> candidate for splitting out into a separate patch.
>
> +   CONVERT_BOOL argument was added to convert bool predicate computations
> +   which is not supported by vectorizer to int type through creating of
> +   conditional expressions.  */
>
> Example?  The vectorizer has patterns for bool predicate computations.
> This seems to be another feature that needs splitting out.
>
> The way you get around the critical edge parts looks awkward to me.
> Please either do _all_ predicates as edge predicates or simply
> split critical edges (of the respective loop body).
>
> I still think that an utility doing same PHI arg merging by introducing
> forwarder blocks would be nicer to have.
>
> I'd restructure the main tree_if_conversion function to apply these
> CFG pre-transforms when we are going to version the loop
> for if conversion (eventually transitioning to always doing that).
>
> So - please split up the patch.  It's way too big.
>
> Thanks,
> Richard.
>
>> 2014-08-15  Yuri Rumyantsev  <ysrum...@gmail.com>
>>
>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>> (flag_force_vectorize): New variable.
>> (edge_predicate): New function.
>> (set_edge_predicate): New function.
>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>> (init_bb_predicate): Add initialization of negate_predicate field.
>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>> (convert_name_to_cmp): New function.
>> (get_type_for_cond): New function.
>> (convert_bool_predicate): New function.
>> (predicate_disjunction): New function.
>> (predicate_conjunction): New function.
>> (add_to_predicate_list): Add convert_bool argument.
>> Use predicate of cd-equivalent block if convert_bool is true and
>> such bb exists; save it in static variable for further possible use.
>> Add call of predicate_disjunction if convert_bool argument is true.
>> (add_to_dst_predicate_list): Add convert_bool argument.
>> Add early function exit if edge target block is always executed.
>> Add call of predicate_conjunction if convert_bool argument is true.
>> Pass convert_bool argument for add_to_predicate_list.
>> Set-up predicate for crritical edge if convert_bool is true.
>> (equal_phi_args): New function.
>> (phi_has_two_different_args): New function.
>> (if_convertible_phi_p): Accept phi nodes with more than two args
>> if flag_force_vectorize wa set-up.
>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>> (if_convertible_stmt_p): Allow calls of function clones if
>> flag_force_vectorize was set-up.
>> (all_edges_are_critical): New function.
>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>> to reject block if-conversion with imcoming critical edges only if
>> flag_force_vectorize was not set-up.
>> (walk_cond_tree): New function.
>> (vect_bool_pattern_is_applicable): New function.
>> (predicate_bbs): Add convert_bool argument which is used to transform
>> comparison expressions of boolean type into conditional expressions
>> with integral operands. If convert_bool argument was set-up and
>> vect bool pattern can be appied perform the following transformation:
>> (bool) x != 0  --> y = (int) x; x != 0;
>> Add check that if fold_build2 produces bool conversion if convert_bool
>> was set-up, recompute predicate using build2_loc. Additional argument
>> 'convert_bool" is passed to add_to_dst_predicate_list and
>> add_to_predicate_list.
>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>> Call predicate_bbs with additional argument equal to false.
>> (find_phi_replacement_condition): Extend function interface:
>> it returns NULL if given phi node must be handled by means of
>> extended phi node predication. If number of predecessors of phi-block
>> is equal 2 and atleast one incoming edge is not critical original
>> algorithm is used.
>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>> phi arguments must be evaluated through phi_has_two_different_args.
>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>> (get_predicate_for_edge): New function.
>> (find_insertion_point): New function.
>> (predicate_arbitrary_phi): New function.
>> (predicate_extended_scalar_phi): New function.
>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>> iterator for predication of extended scalar phi's for insertion.
>> (insert_gimplified_predicates): Add test for non-predicated basic
>> blocks that there are no gimplified statements to insert. Insert
>> predicates at the block begining for extended if-conversion.
>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>> predication to build mask.
>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>> (tree_if_conversion): Initialize flag_force_vectorize from current
>> loop or outer loop (to support pragma omp declare).Do loop versioning
>> for innermost loop marked with pragma omp simd.
>>
>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guent...@gmail.com>:
>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>>> Hi All,
>>>>
>>>> We implemented additional support for pragma omp simd in part of
>>>> extended if-conversion loops with such pragma. These extensions
>>>> include:
>>>>
>>>> 1. All extensions are performed only if considered loop or its outer
>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>    loops behavior was not changed.
>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>    predecessors.
>>>> 3. Put additional restriction on phi nodes which was missed in current 
>>>> design:
>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>    semantic of COND_EXPR which is used for transformation.
>>>
>>> How is that so?  If the PHI is predicated then its result will be used
>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>
>>> No?
>>>
>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>> with some limitations:
>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>    arguments are different and one of them has the only occurence,
>>>> transformation to  single COND_EXPR can be done.
>>>>    - if phi node has more different arguments and all edge predicates
>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>    will be generated for it. In current design very simple check is used:
>>>>    check starting from end that two edges correspondent to neighbor
>>>> arguments have common predecessor which is used for further check
>>>> with next edge.
>>>>  These guarantee that phi predication will produce the correct result.
>>>
>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>> inserting forwarder blocks.  Thus
>>>
>>>    x = PHI <1(2), 1(3), 2(4)>
>>>
>>> becomes
>>>
>>>   bb 5: <forwarder-from(2)-and(3)>
>>>
>>>   x = PHI <1(5), 2(4)>
>>>
>>> and
>>>
>>>   x = PHI <1(2), 2(3), 3(4)>
>>>
>>> becomes
>>>
>>>   bb 5:
>>>   x' = PHI <1(2), 2(3)>
>>>
>>>   b = PHI<x'(5), 3(4)>
>>>
>>> which means that 3) has to work.  Note that we want this kind of
>>> PHI transforms for out-of-SSA as well to reduce the number of
>>> copies we need to insert on edges.
>>>
>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>> And make 3) work properly if it doesn't already.
>>>
>>> It looks like you introduce a "negate predicate" to work around the
>>> critical edge limitation?  Please instead change if-conversion to
>>> work with edge predicates (as opposed to BB predicates).
>>>
>>> Thanks,
>>> Richard.
>>>
>>>>
>>>> Here is example of such extended predication (compile with 
>>>> -march=core-avx2):
>>>> #pragma omp simd safelen(8)
>>>>   for (i=0; i<512; i++)
>>>>   {
>>>>     float t = a[i];
>>>>     if (t > 0 & t < 1.0e+17f)
>>>>       if (c[i] != 0)
>>>> res += 1;
>>>>   }
>>>>   <bb 4>:
>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>   t_5 = a[i_16];
>>>>   _6 = t_5 > 0.0;
>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>   _8 = _7 & _6;
>>>>   _ifc__28 = (unsigned int) _8;
>>>>   _10 = &c[i_16];
>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>   _ifc__30 = (int) _ifc__29;
>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>   _ifc__33 = (int) _ifc__32;
>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>   res_1 = res_15 + _ifc__35;
>>>>   i_11 = i_16 + 1;
>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>   if (ivtmp_14 != 0)
>>>>     goto <bb 4>;
>>>>
>>>> Bootstrap and regression testing did not show any new failures.
>>>>
>>>> gcc/ChageLog
>>>>
>>>> 2014-06-25  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>
>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>> (bb_negate_predicate): New function.
>>>> (set_bb_negate_predicate): New function.
>>>> (bb_copy_predicate): New function.
>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>> (convert_name_to_cmp): New function.
>>>> (get_type_for_cond): New function.
>>>> (convert_bool_predicate): New function.
>>>> (predicate_disjunction): New function.
>>>> (predicate_conjunction): New function.
>>>> (add_to_predicate_list): Add convert_bool argument.
>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>> Add early function exit if edge target block is always executed.
>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>> Pass convert_bool argument for add_to_predicate_list.
>>>> (equal_phi_args): New function.
>>>> (phi_has_two_different_args): New function.
>>>> (phi_args_disjoint): New function.
>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>> in non-predicated basic blocks.
>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>> (all_edges_are_critical): New function.
>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>> to reject block if-conversion with imcoming critical edges only if
>>>> flag_force_vectorize was not setup.
>>>> (walk_cond_tree): New function.
>>>> (vect_bool_pattern_is_applicable): New function.
>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>> comparison expressions of boolean type into conditional expressions
>>>> with integral operands. If bool_conv argument is false or both
>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>> is used, otherwise the following code was added: check on applicable
>>>> of vect-bool-pattern recognition and trnasformation of
>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>> compute predicates for both outgoing edges one of which is critical
>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>> normal outgoing edge only; evaluated predicates are stored in
>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>> but generated gimplified statements are stored in their destination
>>>> block fields. Additional argument 'convert_bool" is passed to
>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>> equal to false.
>>>> (find_phi_replacement_condition): Extend function interface:
>>>> it returns NULL if given phi node must be handled by means of
>>>> extended phi node predication. If number of predecessors of phi-block
>>>> is equal 2 and atleast one incoming edge is not critical original
>>>> algorithm is used.
>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>> (get_predicate_for_edge): New function.
>>>> (find_insertion_point): New function.
>>>> (predicate_phi_disjoint_args): New function.
>>>> (predicate_extended_scalar_phi): New function.
>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>> iterator for predication of extended scalar phi's for insertion.
>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>> blocks that there are no gimplified statements to insert. Insert
>>>> predicates at the block begining for extended if-conversion.
>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>> predication to build mask.
>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>> (split_crit_edge): New function.
>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>> split_crit_edge for extended predication. Do loop versioning for
>>>> innermost loop marked with pragma omp simd.

Attachment: patch.part1
Description: Binary data

Reply via email to