Richard Biener <richard.guent...@gmail.com> writes:
> On Tue, Oct 15, 2019 at 8:07 AM Prathamesh Kulkarni
> <prathamesh.kulka...@linaro.org> wrote:
>>
>> On Wed, 9 Oct 2019 at 08:14, Prathamesh Kulkarni
>> <prathamesh.kulka...@linaro.org> wrote:
>> >
>> > On Tue, 8 Oct 2019 at 13:21, Richard Sandiford
>> > <richard.sandif...@arm.com> wrote:
>> > >
>> > > Leaving the main review to Richard, just some comments...
>> > >
>> > > Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
>> > > > @@ -9774,6 +9777,10 @@ vect_is_simple_cond (tree cond, vec_info *vinfo,
>> > > >
>> > > >     When STMT_INFO is vectorized as a nested cycle, for_reduction is 
>> > > > true.
>> > > >
>> > > > +   For COND_EXPR<C, T, E> if T comes from masked load, and is 
>> > > > conditional
>> > > > +   on C, we apply loop mask to result of vector comparison, if it's 
>> > > > present.
>> > > > +   Similarly for E, if it is conditional on !C.
>> > > > +
>> > > >     Return true if STMT_INFO is vectorizable in this way.  */
>> > > >
>> > > >  bool
>> > >
>> > > I think this is a bit misleading.  But IMO it'd be better not to have
>> > > a comment here and just rely on the one in the main function body.
>> > > This optimisation isn't really changing the vectorisation strategy,
>> > > and the comment could easily get forgotten if things change in future.
>> > >
>> > > > [...]
>> > > > @@ -9999,6 +10006,35 @@ vectorizable_condition (stmt_vec_info 
>> > > > stmt_info, gimple_stmt_iterator *gsi,
>> > > >    /* Handle cond expr.  */
>> > > >    for (j = 0; j < ncopies; j++)
>> > > >      {
>> > > > +      tree loop_mask = NULL_TREE;
>> > > > +      bool swap_cond_operands = false;
>> > > > +
>> > > > +      /* Look up if there is a loop mask associated with the
>> > > > +      scalar cond, or it's inverse.  */
>> > >
>> > > Maybe:
>> > >
>> > >    See whether another part of the vectorized code applies a loop
>> > >    mask to the condition, or to its inverse.
>> > >
>> > > > +
>> > > > +      if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
>> > > > +     {
>> > > > +       scalar_cond_masked_key cond (cond_expr, ncopies);
>> > > > +       if (loop_vinfo->scalar_cond_masked_set.contains (cond))
>> > > > +         {
>> > > > +           vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
>> > > > +           loop_mask = vect_get_loop_mask (gsi, masks, ncopies, 
>> > > > vectype, j);
>> > > > +         }
>> > > > +       else
>> > > > +         {
>> > > > +           bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
>> > > > +           cond.code = invert_tree_comparison (cond.code, honor_nans);
>> > > > +           if (loop_vinfo->scalar_cond_masked_set.contains (cond))
>> > > > +             {
>> > > > +               vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
>> > > > +               loop_mask = vect_get_loop_mask (gsi, masks, ncopies,
>> > > > +                                               vectype, j);
>> > > > +               cond_code = cond.code;
>> > > > +               swap_cond_operands = true;
>> > > > +             }
>> > > > +         }
>> > > > +     }
>> > > > +
>> > > >        stmt_vec_info new_stmt_info = NULL;
>> > > >        if (j == 0)
>> > > >       {
>> > > > @@ -10114,6 +10153,47 @@ vectorizable_condition (stmt_vec_info 
>> > > > stmt_info, gimple_stmt_iterator *gsi,
>> > > >                   }
>> > > >               }
>> > > >           }
>> > > > +
>> > > > +       /* If loop mask is present, then AND it with
>> > >
>> > > Maybe "If we decided to apply a loop mask, ..."
>> > >
>> > > > +          result of vec comparison, so later passes (fre4)
>> > >
>> > > Probably better not to name the pass -- could easily change in future.
>> > >
>> > > > +          will reuse the same condition used in masked load.
>> > >
>> > > Could be a masked store, or potentially other things too.
>> > > So maybe just "will reuse the masked condition"?
>> > >
>> > > > +
>> > > > +          For example:
>> > > > +          for (int i = 0; i < 100; ++i)
>> > > > +            x[i] = y[i] ? z[i] : 10;
>> > > > +
>> > > > +          results in following optimized GIMPLE:
>> > > > +
>> > > > +          mask__35.8_43 = vect__4.7_41 != { 0, ... };
>> > > > +          vec_mask_and_46 = loop_mask_40 & mask__35.8_43;
>> > > > +          _19 = &MEM[base: z_12(D), index: ivtmp_56, step: 4, offset: 
>> > > > 0B];
>> > > > +          vect_iftmp.11_47 = .MASK_LOAD (_19, 4B, vec_mask_and_46);
>> > > > +          vect_iftmp.12_52 = VEC_COND_EXPR <vec_mask_and_46,
>> > > > +                                            vect_iftmp.11_47, { 10, 
>> > > > ... }>;
>> > > > +
>> > > > +          instead of recomputing vec != { 0, ... } in vec_cond_expr  
>> > > > */
>> > >
>> > > That's true, but gives the impression that avoiding the vec != { 0, ... }
>> > > is the main goal, whereas we could do that just by forcing a 
>> > > three-operand
>> > > COND_EXPR.  It's really more about making sure that vec != { 0, ... }
>> > > and its masked form aren't both live at the same time.  So maybe:
>> > >
>> > >              instead of using a masked and unmasked forms of
>> > >              vect__4.7_41 != { 0, ... } (masked in the MASK_LOAD,
>> > >              unmasked in the VEC_COND_EXPR).  */
>> > >
>> > Hi Richard,
>> > Thanks for the suggestions, I have updated comments in the attached patch.
>> Hi,
>> The attached patch is rebased on trunk, and after PR91532 fix, the
>> hunk for fmla_2.c is no
>> longer required.
>
> Hmm.  So we already record some mask info - you just add in addition
> to that the scalar predicate representing the mask.  I wonder if you can
> integrate that into the existing vec_loop_masks vector instead of
> adding another data structure on the side?  Not that I am understanding
> the existing fully masked code at all (or specifically what it computes
> as nscalars_per_iter, etc. ... :/).

We can AND several different scalar conditions with the same loop
mask (that's relatively common), and could even AND the same scalar
condition with different loop masks (although that's less likely).
So I think having separate info makes sense.

> At least add the new vinfo member right to the other masks related
> field.

Agree that would be better.

> @@ -10122,6 +10157,48 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> gimple_stmt_iterator *gsi,
>                   }
>               }
>           }
> +
> +       /* If we decided to apply a loop mask to result of vec
> +          comparison, so later passes will reuse the same condition.

Maybe:

          /* If we decided to apply a loop mask to the result of the vector
             comparison, AND the comparison with the mask now.  Later passes
             should then be able to reuse the AND results between mulitple
             vector statements.

> +          For example:
> +          for (int i = 0; i < 100; ++i)
> +            x[i] = y[i] ? z[i] : 10;
> +
> +          results in following optimized GIMPLE:
> +
> +          mask__35.8_43 = vect__4.7_41 != { 0, ... };
> +          vec_mask_and_46 = loop_mask_40 & mask__35.8_43;
> +          _19 = &MEM[base: z_12(D), index: ivtmp_56, step: 4, offset: 0B];
> +          vect_iftmp.11_47 = .MASK_LOAD (_19, 4B, vec_mask_and_46);
> +          vect_iftmp.12_52 = VEC_COND_EXPR <vec_mask_and_46,
> +                                            vect_iftmp.11_47, { 10, ... }>;
> +
> +          instead of using a masked and unmasked forms of
> +             vec != { 0, ... } (masked in the MASK_LOAD,
> +             unmasked in the VEC_COND_EXPR).  */

The last paragraph uses some space rather than tab indentation.

> +/* If code(T) is comparison op or def of comparison stmt,
> +   extract it's operands.
> +   Else return <NE_EXPR, T, 0>.  */
> +
> +void
> +scalar_cond_masked_key::get_cond_ops_from_tree (tree t)
> +{

Maybe:

/* If the condition represented by T is a comparison or the SSA name
   result of a comparison, extract the comparison's operands.  Represent
   T as NE_EXPR <T, 0> otherwise.  */

OK with those changes and the one Richard asked for.

Thanks,
Richard

Reply via email to