Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

Richard Biener Fri, 18 Sep 2015 05:18:12 -0700

On Tue, Sep 15, 2015 at 5:32 PM, Alan Hayward <alan.hayw...@arm.com> wrote:
>
>
> On 15/09/2015 13:09, "Richard Biener" <richard.guent...@gmail.com> wrote:
>
>>
>>Now comments on the patch itself.
>>
>>+      if (code == COND_EXPR)
>>+       *v_reduc_type = COND_REDUCTION;
>>
>>so why not simply use COND_EXPR instead of the new v_reduc_type?
>
> v_reduc_type is also dependant on check_reduction (which comes from
> !nested_cycle in vectorizable_reduction).
> It seemed messy to keep on checking for both those things throughout.
>
> In my patch to catch simpler condition reductions, I’ll be adding another
> value to this enum too. v_reduc_type will be set to this new value based
> on the same properties for COND_REDUCTION plus some additional constraints.
>
>>
>>+  if (check_reduction && code != COND_EXPR &&
>>+      vect_is_slp_reduction (loop_info, phi, def_stmt))
>>
>>&&s go to the next line
>
> ok
>
>>
>>+             /* Reduction of the max index and a reduction of the found
>>+                values.  */
>>+             epilogue_cost += add_stmt_cost (target_cost_data, 1,
>>+                                             vec_to_scalar, stmt_info, 0,
>>+                                             vect_epilogue);
>>
>>vec_to_scalar once isn't what the comment suggests.  Instead the
>>comment suggests twice what a regular reduction would do
>>but I guess we can "hide" the vec_to_scalar cost and "merge" it
>>with the broadcast.  Thus make the above two vector_stmt costs?
>>
>>+             /* A broadcast of the max value.  */
>>+             epilogue_cost += add_stmt_cost (target_cost_data, 2,
>>+                                             scalar_to_vec, stmt_info, 0,
>>+                                             vect_epilogue);
>>
>>comment suggests a single broadcast.
>
> I’ve made a copy/paste error here. Just need to swap the 1 and the 2.
>
>
>>
>>@@ -3705,7 +3764,7 @@ get_initial_def_for_induction (gimple iv_phi)
>>          the final vector of induction results:  */
>>       exit_phi = NULL;
>>       FOR_EACH_IMM_USE_FAST (use_p, imm_iter, loop_arg)
>>-        {
>>+       {
>>          gimple use_stmt = USE_STMT (use_p);
>>          if (is_gimple_debug (use_stmt))
>>            continue;
>>
>>please avoid unrelated whitespace changes.
>
> Ok. I was changing “8 spaces” to a tab, but happy to revert.
>
>>
>>+      case COND_EXPR:
>>+       if (v_reduc_type == COND_REDUCTION)
>>+         {
>>...
>>+       /* Fall through.  */
>>+
>>       case MIN_EXPR:
>>       case MAX_EXPR:
>>-      case COND_EXPR:
>>
>>aww, so we could already handle COND_EXPR reductions?  How do they
>>differ from what you add?  Did you see if that path is even exercised
>>today?
>
> Today, COND_EXPRs are only supported when they are nested inside a loop.
> See the vect-cond-*.c tests.
> For example:
>
> for (j = 0; j < M; j++)
>     {
>   x = x_in[j];
>   curr_a = a[0];
>
>     for (i = 0; i < N; i++)
>       {
>         next_a = a[i+1];
>         curr_a = x > c[i] ? curr_a : next_a;
>       }
>   x_out[j] = curr_a;
> }
>
>
> In that case, only the outer loop is vectorised.
>
>>
>>+           /* Create a vector of {init_value, 0, 0, 0...}.  */
>>+           vec<constructor_elt, va_gc> *v;
>>+           vec_alloc (v, nunits);
>>+           CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, init_val);
>>+           if (SCALAR_FLOAT_TYPE_P (scalar_type))
>>+             for (i = 1; i < nunits; ++i)
>>+               CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
>>+                                       build_real (scalar_type,
>>dconst0));
>>+           else
>>+             for (i = 1; i < nunits; ++i)
>>+               CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
>>+                                       build_int_cst (scalar_type, 0));
>>+           init_def = build_constructor (vectype, v);
>>
>>you can unify the float/int case by using build_zero_cst (scalar_type).
>>Note that you should build a vector constant instead of a constructor
>>if init_val is a constant.  The convenient way is to build the vector
>>elements into a tree[] array and use build_vector_stat in that case.
>
> Ok, will switch to build_zero_cst.
> Also, I will switch my vector to  {init_value, init_value, init_value…}.
> I had {init_value, 0, 0, 0…} because I was going have the option of
> ADD_REDUC_EXPR,
> But that got removed along the way.


You can then simply use build_vector_from_val ().

>>
>>+      /* Find maximum value from the vector of found indexes.  */
>>+      tree max_index = make_temp_ssa_name (index_scalar_type, NULL, "");
>>
>>just use make_ssa_name (index_scalar_type);
>
> Ok
>
>>
>>+      /* Convert the vector of data to the same type as the EQ.  */
>>+      tree vec_data_cast;
>>+      if ( TYPE_UNSIGNED (index_vec_type))
>>+       {
>>
>>How come it never happens the element
>>sizes do not match?  (int index type and double data type?)
>
> This was a little unclear.
> The induction index is originally created as an unsigned version of the
> type as the data vector.
> (see the definition of cr_index_vector_type in vectorizable_reduction(),
> which is then used to create cond_name)
>
> I will remove the if and replace with a gcc_checking_assert(TYPE_UNSIGNED
> (index_vec_type))
>
>
>>
>>+      /* Where the max index occured, use the value from the data
>>vector.  */
>>+      tree vec_and = make_temp_ssa_name (index_vec_type_signed, NULL,
>>"");
>>+      gimple vec_and_stmt = gimple_build_assign (vec_and, BIT_AND_EXPR,
>>+                                                vec_compare,
>>vec_data_cast);
>>
>>that is, don't you need to do some widening/shortening on the comparison
>>result?
>>(also what happens in the case "or all of the values"?)
>>
>>Definitely too much VIEW_CONVERT_MAGIC here for my taste ;)
>
> Given the induction index is the same size as the data, this will be ok.
>
> I don’t like the VIEW_CONVERT_MAGIC either! But I couldn’t see any other
> way of making it work.
>
>
> The case of "all of the values” is when there were no matches in the loop.
> When this happens, the data vector will contain only the default values
> {init_value, 0, 0, 0…}
> (… once I change my code due to the previous comment, we’ll have
> {init_value, init_value, init_value…} ).
> The REDUC_MAX_EXPR will turn either of those vectors into a “init_value”.
>
>>
>>+   This function also handles reduction of condition expressions, for
>>example:
>>+     for (int i = 0; i < N; i++)
>>+       if (a[i] < value)
>>+        last = a[i];
>>+   This is handled by vectorising the loop and creating an additional
>>vector
>>+   containing the loop indexes for which "a[i] < value" was true.  In the
>>+   function epilogue this is reduced to a single max value and then used
>>to
>>+   index into the vector of results.
>>
>>I miss a comment that shows the kind of code we transform this to.
>>"an additional vector containing the loop indexes" can't work - the vector
>>will not be large enough ;)  Naiively I would have made 'last' a vector,
>>performing the reduction element-wise and in the epilogue reduce
>>'last' itself.  And it looks like we are already doing that for
>>
>>int foo (int *a)
>>{
>>  int val = 0;
>>  for (int i = 0; i < 1024; ++i)
>>    if (a[i] > val)
>>      val = a[i];
>>  return val;
>>}
>>
>>I must be missing something.  Yes, I think we can't do index reduction
>>yet,
>>but pr65947-10.c is alrady handled?
>
>
> I think an easier way of looking thinking about this is to see what
> happens each iteration of the loop:
>
> Assuming the example above and a vector length of 4.
> The index vector starts as (0,0,0,0).
> On the first iteration, the condition passes for, lets say, i=2 and i=3,
> then index vector is (0,0,2,3)
> on the next iteration, the condition passes for i=4, then index vector is
> now (4,0,2,3)
> on the next iteration, the condition passes for i=8 and i=9, then index
> vector is now (8,9,2,3)
> Etc
> In the end we might end up with something like index vector=(26,2,54,14).
> The “2” has not been set since the first iteration.
> The only value we care about is the last one set - the 54. The rest are
> meaningless to us.
> At the same time the data vector has been storing values at the same time
> as the index vector, and so now contains (a[26], a[2], a[54], a[14])
>
> In the epilogue, we reduce the index vector down to “54”, then create a
> vector (54,54,54,54).
> This is matched against the index vector to give us
> (false,false,true,false).
> Then we use that vector to index the data vector, giving us (0, 0, a[54],
> 0).
> Which we then reduce down to a single value.

Ok, I see.

That this case is already vectorized is because it implements MAX_EXPR,
modifying it slightly to

int foo (int *a)
{
  int val = 0;
  for (int i = 0; i < 1024; ++i)
    if (a[i] > val)
      val = a[i] + 1;
  return val;
}

makes it no longer handled by current code.

>
>
>
>
>>
>>Stopping here.
>
> Thanks for the comments so far.
> I’ll put together a new patch with the changes above.
>
>
> Alan.
>
>

Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

Reply via email to