On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
> Hi All,
>
> Here is the second patch related to extended predication.
> Few comments which explain a main goal of design.
>
> 1. I don't want to insert any critical edge splitting since it may
> lead to less efficient binaries.
> 2. One special case of extended PHI node predication was introduced
> when #arguments is more than 2 but only two arguments are different
> and one argument has the only occurrence. For such PHI conditional
> scalar reduction is applied.
> This is correspondent to the following statement:
>     if (q1 && q2 && q3) var++
>  New function phi_has_two_different_args was introduced to detect such phi.
> 3. Original algorithm for PHI predication used assumption that at
> least one incoming edge for blocks containing PHI is not critical - it
> guarantees that all computations related to predicate of normal edge
> are already inserted above this block and
> code related to PHI predication can be inserted at the beginning of
> block. But this is not true for critical edges for which predicate
> computations are  in the block where code for phi predication must be
> inserted. So new function find_insertion_point is introduced which is
> simply found out the last statement in block defining predicates
> correspondent to all incoming edges and insert phi predication code
> after it (with some minor exceptions).

Unfortunately the patch doesn't apply for me - I get

patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
predicate_all_scalar_phis (struct loop *loop)

a few remarks nevertheless.  I don't see how we need both
predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
Couldn't we simply sort an array of (edge, value) pairs after value
and handle equal values specially in predicate_extended_scalar_phi?
That would even make PHI <a, a, b, c, c> more optimal.

I don't understand the need for find_insertion_point.  All SSA names
required for the predicates are defined upward - and the complex CFG
is squashed to a single basic-block, thus the defs will dominate the
inserted code if you insert after labels just like for the other case.
Or what am I missing?  ("flattening" of the basic-blocks of course needs
to happen in dominator order - but I guess that happens already?)

I'd like the extended PHI handling to be enablable by a flag even
for !force-vectorization - I've seen cases with 3 PHI args multiple
times that would have been nice to vectorize.  I suggest to
add -ftree-loop-if-convert-aggressive for this.  We can do this as
followup, but please rename the local flag_force_vectorize flag
to something less looking like a flag, like simply 'aggressive'.

Otherwise patch 2 looks ok to me.

Richard.


> ChangeLog:
>
> 2014-10-24  Yuri Rumyantsev  <ysrum...@gmail.com>
>
> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
> FLAG_FORCE_VECTORIZE instead of loop flag.
> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
> FLAG_FORCE_VECTORIZE is true.
> (if_convertible_bb_p): Delete check that bb has at least one
> non-critical incoming edge.
> (phi_has_two_different_args): New function.
> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
> to phi arguments. Invoke phi_has_two_different_args to get phi
> arguments if EXTENDED is true. Change check that block
> containing reduction statement candidate is predecessor
> of phi-block since phi may have more than two arguments.
> (convert_scalar_cond_reduction): Add argument BEFORE to insert
> statement before/after gsi point.
> (predicate_scalar_phi): Add argument false (which means non-extended
> predication) to call of is_cond_scalar_reduction. Add argument
> true (which correspondent to argument BEFORE) to call of
> convert_scalar_cond_reduction.
> (get_predicate_for_edge): New function.
> (predicate_arbitrary_scalar_phi): New function.
> (predicate_extended_scalar_phi): New function.
> (find_insertion_point): New function.
> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
> BEFORE. Initialize EXTENDED to true if BB containing phi has more
> than 2 predecessors or both incoming edges are critical. Invoke
> find_phi_replacement_condition and predicate_scalar_phi or
> find_insertion_point and predicate_extended_scalar_phi depending on
> EXTENDED value.
> (insert_gimplified_predicates): Add check that non-predicated block
> may have statements to insert. Insert predicate of BB just after label
> if FLAG_FORCE_VECTORIZE is true.
> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
> is copy of inner or outer loop field force_vectorize.

Reply via email to