Hi Richard, on 2020/7/7 下午6:15, Richard Sandiford wrote: > "Kewen.Lin" <li...@linux.ibm.com> writes: >> Hi Richard, >> >> on 2020/7/1 下午11:17, Richard Sandiford wrote: >>> "Kewen.Lin" <li...@linux.ibm.com> writes: >>>> on 2020/7/1 上午3:53, Richard Sandiford wrote: >>>>> "Kewen.Lin" <li...@linux.ibm.com> writes: >>>> >>>> Sorry, I didn't quite follow this comment. Both nscalars_totoal and >>>> nscalars_step are scaled here. The remaining related nscalars_* >>>> seems only nscalars_skip, but length can't support skip. >>> >>> Hmm, OK. But in that case can you update the names of the variables >>> to match? It's confusing to have some nscalars_* variables actually >>> count scalars (and thus have “nitems” equivalents) and other nscalars_* >>> variables count something else (and thus effectively be nitems_* variables >>> themselves). >>> >> >> OK. I'll update the names like nscalars_total/nscalars_step and equivalents >> to nitems_total/... (or nunits_total better?) > > I agree “items” isn't great. I was trying to avoid “units” because GCC > often uses that to mean bytes (BITS_PER_UNIT, UNITS_PER_WORD, etc.). > In this context that could be confusing, because sometimes the > “units” actually would be bytes, but not always. >
Got it! Thanks! [...] >>> But in this particular test, we're doing outer loop vectorisation, >>> and the only elements of vres that matter are the ones selected >>> by loop_mask (since those are the only ones that get stored out). >>> So applying the loop mask to the VEC_COND_EXPR is “just” an >>> (important) optimisation, rather than a correctness issue. >>> >> >> Thanks for the clarification. It looks the vres is always safe since its >> further usage is guard with loop mask. Then sorry that I didn't catch why >> it is one optimization for this case, is there some difference in backend >> supports on this different mask for cond_expr? > > No, the idea of the optimisation is to avoid cases in which we have: > > cmp_res = …compare… > cmp_res' = cmp_res & loop_mask > IFN_MASK_LOAD (…, cmp_res') > z = cmp_res ? x : y > > The problem here is that cmp_res and cmp_res' are live at the same time, > which prevents cmp_res and cmp_res' from being combined into a single > instruction. It's better for the final instruction to be: > > z = cmp_res' ? x : y > > so that everything uses the same comparison result. > > We can't leave that to later passes because nothing in the gimple IL > indicates that only the loop_mask elements of z matter. > Nice, thanks for the explanation. [...] >>> What's causing the test to start failing with the patch? I realise >>> you've probably already said, sorry, but it's been a large patch series >>> so it's hard to keep all the details committed to memory. >>> >> >> No problem, appreciate your time much! Since length-based partial vectors >> doesn't support any reduction so far, the function has the responsibility >> to disable use_partial_vectors_p for it. Without the above else-if part, >> since the reduction_type is TREE_CODE_REDUCTION for this case, the else part >> will stop this case to use mask-based partial vectors, but the case expects >> the outer loop still able to use mask-based partial vectors. >> >> As your clarification above, else-if looks wrong. Probably we can change it >> to check whether the current vectorization is for outer loop and the >> condition >> stmt being handled is in the inner loop, we can allow it for partial vectors? > > I think it's more whether, for outer loop vectorisation, the reduction > is a double reduction or a simple nested-cycle reduction. Both have > a COND_EXPR in the inner loop, but the extra elements only matter for > double reductions. > > There again, I don't think we actually support double reductions for > COND_EXPR reductions. > OK. I will send one separate patch with your suggestion on this. [...] >> >> OK, but it seems it's acceptable if the IV wider than the precision since >> we allows it out of range? > > For example, suppose that a target handled out-of-range values but > still had a QImode length. If the IV was wider than QI, we'd truncate > 0x100 to 0 when generating the pattern, so a full-vector access would > get truncated to an empty-vector access. > Yeah, it's so true. BR, Kewen