Hi Richard,

on 2020/7/7 下午6:15, Richard Sandiford wrote:
> "Kewen.Lin" <li...@linux.ibm.com> writes:
>> Hi Richard,
>>
>> on 2020/7/1 下午11:17, Richard Sandiford wrote:
>>> "Kewen.Lin" <li...@linux.ibm.com> writes:
>>>> on 2020/7/1 上午3:53, Richard Sandiford wrote:
>>>>> "Kewen.Lin" <li...@linux.ibm.com> writes:
>>>>
>>>> Sorry, I didn't quite follow this comment.  Both nscalars_totoal and
>>>> nscalars_step are scaled here.  The remaining related nscalars_*
>>>> seems only nscalars_skip, but length can't support skip.
>>>
>>> Hmm, OK.  But in that case can you update the names of the variables
>>> to match?  It's confusing to have some nscalars_* variables actually
>>> count scalars (and thus have “nitems” equivalents) and other nscalars_*
>>> variables count something else (and thus effectively be nitems_* variables
>>> themselves).
>>>
>>
>> OK.  I'll update the names like nscalars_total/nscalars_step and equivalents
>> to nitems_total/... (or nunits_total better?)
> 
> I agree “items” isn't great.  I was trying to avoid “units” because GCC
> often uses that to mean bytes (BITS_PER_UNIT, UNITS_PER_WORD, etc.).
> In this context that could be confusing, because sometimes the
> “units” actually would be bytes, but not always.
> 

Got it!  Thanks!

[...]
>>> But in this particular test, we're doing outer loop vectorisation,
>>> and the only elements of vres that matter are the ones selected
>>> by loop_mask (since those are the only ones that get stored out).
>>> So applying the loop mask to the VEC_COND_EXPR is “just” an
>>> (important) optimisation, rather than a correctness issue.
>>>  
>>
>> Thanks for the clarification.  It looks the vres is always safe since its
>> further usage is guard with loop mask.  Then sorry that I didn't catch why
>> it is one optimization for this case, is there some difference in backend
>> supports on this different mask for cond_expr?
> 
> No, the idea of the optimisation is to avoid cases in which we have:
> 
>     cmp_res = …compare…
>     cmp_res' = cmp_res & loop_mask
>     IFN_MASK_LOAD (…, cmp_res')
>     z = cmp_res ? x : y
> 
> The problem here is that cmp_res and cmp_res' are live at the same time,
> which prevents cmp_res and cmp_res' from being combined into a single
> instruction.  It's better for the final instruction to be:
> 
>     z = cmp_res' ? x : y
> 
> so that everything uses the same comparison result.
> 
> We can't leave that to later passes because nothing in the gimple IL
> indicates that only the loop_mask elements of z matter.
> 

Nice, thanks for the explanation.


[...]
>>> What's causing the test to start failing with the patch?  I realise
>>> you've probably already said, sorry, but it's been a large patch series
>>> so it's hard to keep all the details committed to memory.
>>>
>>
>> No problem, appreciate your time much!  Since length-based partial vectors
>> doesn't support any reduction so far, the function has the responsibility
>> to disable use_partial_vectors_p for it.  Without the above else-if part,
>> since the reduction_type is TREE_CODE_REDUCTION for this case, the else part
>> will stop this case to use mask-based partial vectors, but the case expects
>> the outer loop still able to use mask-based partial vectors.
>>
>> As your clarification above, else-if looks wrong.  Probably we can change it
>> to check whether the current vectorization is for outer loop and the 
>> condition
>> stmt being handled is in the inner loop, we can allow it for partial vectors?
> 
> I think it's more whether, for outer loop vectorisation, the reduction
> is a double reduction or a simple nested-cycle reduction.  Both have
> a COND_EXPR in the inner loop, but the extra elements only matter for
> double reductions.
> 
> There again, I don't think we actually support double reductions for
> COND_EXPR reductions.
> 

OK.  I will send one separate patch with your suggestion on this.

[...]
>>
>> OK, but it seems it's acceptable if the IV wider than the precision since
>> we allows it out of range?
> 
> For example, suppose that a target handled out-of-range values but
> still had a QImode length.  If the IV was wider than QI, we'd truncate
> 0x100 to 0 when generating the pattern, so a full-vector access would
> get truncated to an empty-vector access.
> 

Yeah, it's so true.

BR,
Kewen

Reply via email to