store with length in vectorizer

Kewen.Lin via Gcc-patches Tue, 02 Jun 2020 23:33:47 -0700

Hi Richard,

Thanks a lot for your great comments!


on 2020/6/2 下午7:50, Richard Sandiford wrote:
> "Kewen.Lin" <li...@linux.ibm.com> writes:
>> Hi Richard,
>>
>> on 2020/5/29 下午4:32, Richard Sandiford wrote:
>>> "Kewen.Lin" <li...@linux.ibm.com> writes:
>>>> on 2020/5/27 下午6:02, Richard Sandiford wrote:
>>>>> "Kewen.Lin" <li...@linux.ibm.com> writes:
>>>>>> Hi Richard,
>>>>>>

snip ...

> 
> It would be easier to review, and perhaps easier to bisect,
> if some of the mechanical changes were split out.  E.g.:
> 
> - Rename can_fully_mask_p to can_use_partial_vectors_p.
> 
> - Rename fully_masked_p to using_partial_vectors_p.
> 
> - Rename things related to rgroup_masks.  I think “rgroup_controls”
>   or “rgroup_guards” might be more descriptive than “rgroup_objs”.
> 
> These should be fairly mechanical changes and can happen ahead of
> the main series.  It'll then be easier to see what's different
> for masks and lengths, separately from the more mechanical stuff.
> 

Good suggestion.  My fault, I should have done it before. 
Will split it into some NFC patches.

> As far as:
> 
> +  union
> +  {
> +    /* The type of mask to use, based on the highest nS recorded above.  */
> +    tree mask_type;
> +    /* Any vector type to use these lengths.  */
> +    tree vec_type;
> +  };
> 
> goes, some parts of the code seem to use mask_type for lengths too,
> which I'm a bit nervous about.  I think we should either be consistent
> about which union field we use (always mask_type for masks, always
> vec_type for lengths) or we should just rename mask_type to something
> more generic.  Just "type" might be good enough with a suitable comment.

Will fix it.

> 
>>  {
>>    tree compare_type = LOOP_VINFO_MASK_COMPARE_TYPE (loop_vinfo);
>>    tree iv_type = LOOP_VINFO_MASK_IV_TYPE (loop_vinfo);
>> -  tree mask_type = rgm->mask_type;
>> -  unsigned int nscalars_per_iter = rgm->max_nscalars_per_iter;
>> -  poly_uint64 nscalars_per_mask = TYPE_VECTOR_SUBPARTS (mask_type);
>> +
>> +  bool vect_for_masking = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
>> +  if (!vect_for_masking)
>> +    {
>> +      /* Obtain target supported length type.  */
>> +      scalar_int_mode len_mode = targetm.vectorize.length_mode;
>> +      unsigned int len_prec = GET_MODE_PRECISION (len_mode);
>> +      compare_type = build_nonstandard_integer_type (len_prec, true);
>> +      /* Simply set iv_type as same as compare_type.  */
>> +      iv_type = compare_type;
> 
> This might not be the best time to bring this up :-) but it seems
> odd to be asking the target for the induction variable type here.
> I got the impression that the hook was returning DImode, whereas
> the PowerPC instructions only looked at the low 8 bits of the length.
> If so, forcing a naturally 32-bit IV to DImode would insert extra
> sign/zero extensions, even though the input to the length intrinsics
> would have been happy with the 32-bit IV.
> 

Good point, I'll check it with some cases.  As Segher pointed out, the 8
bits in bits 0-7 (the top, abnormal I admit), these vector with length
instructions are guarded in 64 bits only.  IIUC the extra sign/zero
extensions would exist in pre-header with current setting?  At that time
I thought the iv with less precision than length mode had to be converted
later for length's need, it looks good to use length mode simply.

> I think it would make sense to ask the target for its minimum
> precision P (which would be 8 bits if the above is correct).
> The starting point would then be the maximum of:
> 
> - this P
> - the IV's natural precision
> - the precision needed to hold:
>     the maximum number of scalar iterations multiplied by the scale factor
>     (to convert scalar counts to bytes)
> 
> If the IV might wrap at that precision without producing all-zero lengths,
> it would be worth doubling the precision to avoid the wrapping issue,
> provided that we don't go beyond BITS_PER_WORD.
> 
Thanks! Will think/test more on this part.

>> +  tree obj_type = rgo->mask_type;
>> +  /* Here, take nscalars_per_iter as nbytes_per_iter for length.  */
>> +  unsigned int nscalars_per_iter = rgo->max_nscalars_per_iter;
> 
> I think whether we count scalars or count bytes is really a separate
> decision that shouldn't be tied directly to using lengths.  Length-based
> loads and stores on other arches might want to count scalars too.
> I'm not saying you should add support for that (it wouldn't be tested),
> but I think we should avoid structuring the code to make it harder to
> add in future.
> 

It makes sense, will update it.

> So I think nscalars_per_iter should always count scalars and anything
> length-based should be separate.  Would it make sense to store the
> length scale factor as a separate field?  I.e. using the terms
> above the rgroup_masks comment, the length IV step is:
> 
>    factor * nS * VF == factor * nV * nL
> 

Yeah, factor*nS becomes what we wanted for length-based in bytes, factor
* nL would be the vector size.

> That way, applying the factor becomes separate from lengths vs. masks.
> The factor would also be useful in calculating the IV precision above.
> 

Yeah, nice!

>> [...]
>> -/* Make LOOP iterate NITERS times using masking and WHILE_ULT calls.
>> -   LOOP_VINFO describes the vectorization of LOOP.  NITERS is the
>> -   number of iterations of the original scalar loop that should be
>> -   handled by the vector loop.  NITERS_MAYBE_ZERO and FINAL_IV are
>> -   as for vect_set_loop_condition.
>> +/* Make LOOP iterate NITERS times using objects like masks (and
>> +   WHILE_ULT calls) or lengths.  LOOP_VINFO describes the vectorization
>> +   of LOOP.  NITERS is the number of iterations of the original scalar
>> +   loop that should be handled by the vector loop.  NITERS_MAYBE_ZERO
>> +   and FINAL_IV are as for vect_set_loop_condition.
>>  
>>     Insert the branch-back condition before LOOP_COND_GSI and return the
>>     final gcond.  */
>>  
>>  static gcond *
>> -vect_set_loop_condition_masked (class loop *loop, loop_vec_info loop_vinfo,
>> -                            tree niters, tree final_iv,
>> -                            bool niters_maybe_zero,
>> -                            gimple_stmt_iterator loop_cond_gsi)
>> +vect_set_loop_condition_partial (class loop *loop, loop_vec_info loop_vinfo,
>> +                             tree niters, tree final_iv,
>> +                             bool niters_maybe_zero,
>> +                             gimple_stmt_iterator loop_cond_gsi)
>>  {
>>    gimple_seq preheader_seq = NULL;
>>    gimple_seq header_seq = NULL;
>>  
>> +  bool vect_for_masking = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
>> +
>>    tree compare_type = LOOP_VINFO_MASK_COMPARE_TYPE (loop_vinfo);
>> +  if (!vect_for_masking)
>> +    {
>> +      /* Obtain target supported length type as compare_type.  */
>> +      scalar_int_mode len_mode = targetm.vectorize.length_mode;
>> +      unsigned len_prec = GET_MODE_PRECISION (len_mode);
>> +      compare_type = build_nonstandard_integer_type (len_prec, true);
> 
> Same comment as above about the choice of IV type.  We shouldn't
> recalculate this multiple times.  It would be better to calculate
> it upfront and store it in the loop_vinfo.

OK.

> 
>> @@ -2567,7 +2622,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
>> niters, tree nitersm1,
>>    if (vect_epilogues
>>        && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>>        && prolog_peeling >= 0
>> -      && known_eq (vf, lowest_vf))
>> +      && known_eq (vf, lowest_vf)
>> +      && !LOOP_VINFO_FULLY_WITH_LENGTH_P (epilogue_vinfo))
> 
> Why's this check needed?
> 

It's mainly for length-based epilogue handlings.

       while (!(constant_multiple_p
                (GET_MODE_SIZE (loop_vinfo->vector_mode),
                 GET_MODE_SIZE (epilogue_vinfo->vector_mode), &ratio)
                 && eiters >= lowest_vf / ratio + epilogue_gaps))

This "if" part checks whether remaining eiters enough for the epilogue,
if eiters less than epilogue's lowest_vf, it will back out the epilogue.
But for partial_vectors, it should be acceptable, since it can deal
with partial.  Probably I should use using_partial_vectors_p here
instead of LOOP_VINFO_FULLY_WITH_LENGTH_P, although the masking won't
have the possiblity to handle the epilogue, the concept would be the same.

>>      {
>>        unsigned HOST_WIDE_INT eiters
>>      = (LOOP_VINFO_INT_NITERS (loop_vinfo)
>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index 80e33b61be7..99e6cb904ba 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -813,8 +813,10 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
>> vec_info_shared *shared)
>>      vec_outside_cost (0),
>>      vec_inside_cost (0),
>>      vectorizable (false),
>> -    can_fully_mask_p (true),
>> +    can_partial_vect_p (true),
> 
> I think “can_use_partial_vectors_p” reads better

Will update with it.

> 
>>      fully_masked_p (false),
>> +    fully_with_length_p (false),
> 
> I think it would be better if these two were a single flag
> (using_partial_vectors_p), with masking vs. lengths being derived
> information.
> 

Will update it.

>> +    epil_partial_vect_p (false),
>>      peeling_for_gaps (false),
>>      peeling_for_niter (false),
>>      no_data_dependencies (false),
>> @@ -880,13 +882,25 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
>> vec_info_shared *shared)
>>  void
>>  release_vec_loop_masks (vec_loop_masks *masks)
>>  {
>> -  rgroup_masks *rgm;
>> +  rgroup_objs *rgm;
>>    unsigned int i;
>>    FOR_EACH_VEC_ELT (*masks, i, rgm)
>> -    rgm->masks.release ();
>> +    rgm->objs.release ();
>>    masks->release ();
>>  }
>>  
>> +/* Free all levels of LENS.  */
>> +
>> +void
>> +release_vec_loop_lens (vec_loop_lens *lens)
>> +{
>> +  rgroup_objs *rgl;
>> +  unsigned int i;
>> +  FOR_EACH_VEC_ELT (*lens, i, rgl)
>> +    rgl->objs.release ();
>> +  lens->release ();
>> +}
>> +
> 
> There's no need to duplicate this function.
> 

Good catch, will rename and merge them.

BR,
Kewen

> The overall approach looks good though.  I just think we need to work
> through the details a bit more.
> 
> Thanks,
> Richard
>

Re: [PATCH 5/7 v3] vect: Support vector load/store with length in vectorizer

Reply via email to