Hi Richard,

I resend you patch1 and patch2 with minor changes:
1. I renamed flag_force_vectorize to aggressive_if_conv.
2. Use static cast for the first argument of gimple_phi_arg_edge.
I also very sorry that I sent you bad patch.

Now let me answer on your questions related to second patch.
1. Why we need both predicate_extended_scalar_phi and
predicate_arbitrary_scalar_phi?

Let's consider the following simple test-case:

  #pragma omp simd safelen(8)
  for (i=0; i<512; i++)
  {
    float t = a[i];
    if (t > 0.0f & t < 1.0e+17f)
      if (c[i] != 0)  /* c is integer array. */
res += 1;
  }

we can see the following phi node correspondent to res:

# res_1 = PHI <res_15(3), res_15(4), res_10(5)>

It is clear that we can optimize it to phi node with 2 arguments only
and only one check can be used for phi predication (for reduction in
our case), namely predicate of bb_5. In general case we can't do it
even if we sort all phi argument values since we still have to produce
a chain of cond expressions to perform phi predication (see comments
for predicate_arbitrary_scalar_phi).
2. Why we need to introduce find_insertion_point?
 Let's consider another test-case extracted from 175.vpr ( t5.c is
attached) and we can see that bb_7 and bb_9 containig phi nodes has
only critical incoming edges and both contain code computing edge
predicates, e.g.

<bb 7>:
# xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
_46 = xmax_17 == xmax_37;
_47 = xmax_17 == xmax_27;
_48 = _46 & _47;
_53 = xmax_17 == xmax_37;
_54 = ~_53;
_55 = xmax_17 == xmax_27;
_56 = _54 & _55;
_57 = _48 | _56;
xmax_edge_19 = xmax_edge_39 + 1;
goto <bb 11>;

It is evident that we can not put phi predication at the block
beginning but need to put it after predicate computations.
Note also that if there are no critical edges for phi arguments
insertion point will be "after labels" Note also that phi result can
have use in this block too, so we can't put predication code to the
block end.

Let me know if you still have any questions.

Best regards.
Yuri.




2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>> Hi All,
>>
>> Here is the second patch related to extended predication.
>> Few comments which explain a main goal of design.
>>
>> 1. I don't want to insert any critical edge splitting since it may
>> lead to less efficient binaries.
>> 2. One special case of extended PHI node predication was introduced
>> when #arguments is more than 2 but only two arguments are different
>> and one argument has the only occurrence. For such PHI conditional
>> scalar reduction is applied.
>> This is correspondent to the following statement:
>>     if (q1 && q2 && q3) var++
>>  New function phi_has_two_different_args was introduced to detect such phi.
>> 3. Original algorithm for PHI predication used assumption that at
>> least one incoming edge for blocks containing PHI is not critical - it
>> guarantees that all computations related to predicate of normal edge
>> are already inserted above this block and
>> code related to PHI predication can be inserted at the beginning of
>> block. But this is not true for critical edges for which predicate
>> computations are  in the block where code for phi predication must be
>> inserted. So new function find_insertion_point is introduced which is
>> simply found out the last statement in block defining predicates
>> correspondent to all incoming edges and insert phi predication code
>> after it (with some minor exceptions).
>
> Unfortunately the patch doesn't apply for me - I get
>
> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
> predicate_all_scalar_phis (struct loop *loop)
>
> a few remarks nevertheless.  I don't see how we need both
> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
> Couldn't we simply sort an array of (edge, value) pairs after value
> and handle equal values specially in predicate_extended_scalar_phi?
> That would even make PHI <a, a, b, c, c> more optimal.
>
> I don't understand the need for find_insertion_point.  All SSA names
> required for the predicates are defined upward - and the complex CFG
> is squashed to a single basic-block, thus the defs will dominate the
> inserted code if you insert after labels just like for the other case.
> Or what am I missing?  ("flattening" of the basic-blocks of course needs
> to happen in dominator order - but I guess that happens already?)
>
> I'd like the extended PHI handling to be enablable by a flag even
> for !force-vectorization - I've seen cases with 3 PHI args multiple
> times that would have been nice to vectorize.  I suggest to
> add -ftree-loop-if-convert-aggressive for this.  We can do this as
> followup, but please rename the local flag_force_vectorize flag
> to something less looking like a flag, like simply 'aggressive'.
>
> Otherwise patch 2 looks ok to me.
>
> Richard.
>
>
>> ChangeLog:
>>
>> 2014-10-24  Yuri Rumyantsev  <ysrum...@gmail.com>
>>
>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>> FLAG_FORCE_VECTORIZE instead of loop flag.
>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>> FLAG_FORCE_VECTORIZE is true.
>> (if_convertible_bb_p): Delete check that bb has at least one
>> non-critical incoming edge.
>> (phi_has_two_different_args): New function.
>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>> to phi arguments. Invoke phi_has_two_different_args to get phi
>> arguments if EXTENDED is true. Change check that block
>> containing reduction statement candidate is predecessor
>> of phi-block since phi may have more than two arguments.
>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>> statement before/after gsi point.
>> (predicate_scalar_phi): Add argument false (which means non-extended
>> predication) to call of is_cond_scalar_reduction. Add argument
>> true (which correspondent to argument BEFORE) to call of
>> convert_scalar_cond_reduction.
>> (get_predicate_for_edge): New function.
>> (predicate_arbitrary_scalar_phi): New function.
>> (predicate_extended_scalar_phi): New function.
>> (find_insertion_point): New function.
>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>> than 2 predecessors or both incoming edges are critical. Invoke
>> find_phi_replacement_condition and predicate_scalar_phi or
>> find_insertion_point and predicate_extended_scalar_phi depending on
>> EXTENDED value.
>> (insert_gimplified_predicates): Add check that non-predicated block
>> may have statements to insert. Insert predicate of BB just after label
>> if FLAG_FORCE_VECTORIZE is true.
>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>> is copy of inner or outer loop field force_vectorize.
#define N 512
#define max(x,y) (x) >= (y)? (x) : (y)
#define min(x,y) (x) <= (y)? (x) : (y)
int c_X[N];
int x_max;
int x_min;
extern int nx;

void foo (int n)
{
  int i, x;
  int xmin, xmax;
  int xmin_edge, xmax_edge;

  x = c_X[0];
  xmin = xmax = x;
  xmin_edge = xmax_edge  = 1;
#pragma omp simd safelen(8)
  for (i = 1; i<n; i++) {
    x = c_X[i];
    x = max(min(x,nx),1);
    if (x == xmin) {  
       xmin_edge++;
    }
    if (x == xmax) {
       xmax_edge++;
    }
    else if (x < xmin) {
       xmin = x;
       xmin_edge = 1;
    }
    else if (x > xmax) {
       xmax = x;
       xmax_edge = 1;
    }

  }
    x_max = xmax_edge;
    x_min = xmin_edge;
}

   

Attachment: if-conv.patch1.1
Description: Binary data

Attachment: if-conv.patch2.1
Description: Binary data

Reply via email to