Richard Biener <richard.guent...@gmail.com> writes: > On Tue, Jun 23, 2020 at 11:53 AM Richard Sandiford > <richard.sandif...@arm.com> wrote: >> >> Things have moved on due to the IRC conversation, but… >> >> "Kewen.Lin" <li...@linux.ibm.com> writes: >> > on 2020/6/23 上午3:59, Richard Sandiford wrote: >> >> "Kewen.Lin" <li...@linux.ibm.com> writes: >> >>> @@ -5167,6 +5167,24 @@ mode @var{n}. >> >>> >> >>> This pattern is not allowed to @code{FAIL}. >> >>> >> >>> +@cindex @code{lenload@var{m}} instruction pattern >> >>> +@item @samp{lenload@var{m}} >> >>> +Perform a vector load with length from memory operand 1 of mode @var{m} >> >>> +into register operand 0. Length is provided in register operand 2 with >> >>> +appropriate mode which should afford the maximal required precision of >> >>> +any available lengths. >> >> >> >> I think we need to say in more detail what “load with length” actually >> >> means. How about: >> >> >> >> Load the number of bytes specified by operand 2 from memory operand 1 >> >> into register operand 0, setting the other bytes of operand 0 to >> >> undefined values. Operands 0 and 1 have mode @var{m}. Operand 2 has >> >> whichever integer mode the target prefers. >> >> >> > >> > Thanks for nice wordings! Updated, for "... to undefined values" I >> > changed it >> > to "... to undefined values or zeros" as Segher's comments to match the >> > behavior >> > on Power. >> >> “set … to undefined values” means that the values are not defined by >> the optab interface. In other words, the target can set the bytes >> to whatever it wants, and gimple code can't make any assumptions about >> what the values of the bytes are. >> >> So setting the bytes to zero (as Power does) would conform to the >> interface. So would leaving the bytes in operand 0 untouched. >> So would using an instruction that really does leave the other >> bytes with undefined values, etc. >> >> So I think we should keep it as just “… to undefined values”, >> >> The alternative would be to define the interface so that targets *must* >> ensure that the other bytes are zeros. But at the moment, the only >> intended use of the optabs and ifns is for autovectorisation, and the >> vectoriser won't care about the values of “inactive” bytes/lanes. >> Forcing the target to set them to a specific value like zero would be >> unnecessarily restrictive. > > Actually it _does_ care.
I'd argue it doesn't, but for essentially the same reasons :-) > This is supposed to be used for fully masked > loops and 'unspecified values' would require us to explicitely zero > them for any FP op because of possible sNaN representations. It > also precludes us from bitwise ORing in an appropriately masked > vector of 1s to make integer division happy (OK, no vector ISA supports > integer division). Zeros would be a problem for FP division too. And even if we require loads to set inactive lanes to zero, we couldn't infer from that that any given FP addition (say) won't raise an exception. E.g. the inputs could be the result of converting integers and adding them could trigger an inexact exception. Or the values could be the result of simple bitcasts, giving arbitrary FP values. (AIUI, current bfloat code works this way.) The vectoriser currently only allows potentially-trapping FP operations on partial vectors if the target provides an appropriate IFN_COND_* function. (That's one of the main use cases for those functions.) In other cases it requires the loop to operate on full vectors. This should be relaxed in future to support inbranch partial vectorisation of simd calls. This means that the current patch series will/should simply punt for “length”-based loop control if the loop contains FP operations that (as far as gimple is concerned) might trap. If we're thinking about how to relax that, then IMO it will need to be done either at the level of each FP operation or by some kind of “global” vectorisation subpass that introduces known-safe values for inactive lanes. The first would be easier, the second would be more optimal. I don't think that's specific to “length” vectorisation though. The same concerns apply to if-converted loops that operate on full vectors. I think the approach would be essentially the same for both. In that scenario, removing zeroing of an IFN_LEN_LOAD would “just” be an optimisation, and could potentially be left to RTL code if necessary. (But see my main point below.) SVE supports integer division btw. :-) > So unless we have evidence that there exists an ISA that does _not_ > zero the excess bits I'd rather specify it does. I think the known architectures that might use this are: - MVE - Power - RVV MVE and Power both set inactive lanes to zero. But I'm not sure about RVV. AIUI, for RVV the approach instead would be to reduce the effective vector length for the final iteration of the vector loop, and I'm not sure whether in that situation it makes sense to say that the other elements still exist and are guaranteed to be zero. I'm the last person who should be speculating on that though. Let's see whether Jim has any comments. In summary, I'm not saying we should never define the inactive values to be zero. I just think that we should leave it until it matters. And I don't think it does/should matter for the current patch series. IFN_MASK_LOAD has been around for quite a long time now and we've never had to define the values of inactive lanes there. Thanks, Richard