on 2020/5/27 下午3:25, Richard Biener wrote: > On Tue, 26 May 2020, Segher Boessenkool wrote: > >> Hi! >> >> On Tue, May 26, 2020 at 01:29:30PM +0100, Richard Sandiford wrote: >>> FWIW, I agree adding .LEN_LOAD and .LEN_STORE seems like a good >>> approach. I think it'll be more maintainable in the long run than >>> trying to have .MASK_LOADs and .MASK_STOREs that need a special mask >>> operand. (That would be too similar to VEC_COND_EXPR :-)) >>> >>> Not sure yet what the exact semantics wrt out-of-range values for >>> the IFN/optab though. Maybe we should instead have some kind of >>> abstract, target-specific cookie created by a separate intrinsic. >>> Haven't thought much about it yet... >> >> Or maybe only support 0..N with N the length of the vector? It is >> pretty important to support 0 and N, but greater than N isn't as >> important (it is useful for tricky hand-written code, but not as much >> for compiler-generate code -- we only support an 8-bit number here on >> Power, maybe that is why ;-) ) > > The question is one of semantics - if power masks the length to an > 8 bit number it's important to preprocess the IV. As with my > other suggestion the question is what to expose to the IL (to GIMPLE) > here. Exposing as much as possible will help IV selection but > will eventually require IFN variations for different semantics. >
In the current implementation, we don't use IFN for the length computation, it has something like: ivtmp_28 = ivtmp_27 + 16; _39 = MIN_EXPR <ivtmp_28, _32>; // _32 is the limit _40 = _32 - _39; // get the zero bytes for the ending _41 = MIN_EXPR <_40, 16>; // check for vector size if (ivtmp_28 < _32) In my initial thought, the len load/store IFNs are considered to accept any lengths (any values hold in length mode), since the length larger than vector size is no sense, the hardware can take it as saturated to vector size, if hardware has some masking bits on it like ppc, we can add one hook to guard the MIN requirement for length gen. For now, the MIN is mandatory since ppc is the only user. FWIW, if we mostly adopt this for epilogues or small loop (iteration < VF), the range can be analyzed during compilation time, these MIN computations can be optimized theoricially. > So yes, 0..N sounds about right here and we'll require a MIN () > operation and likely need to teach IV selection about this to at least > possibly get an IV with the byte size multiplication factored. > FWIW, in the current implementation, the step/limit have multiplied the bytes of lanes first, the IV computation will not have the multilcation for it there. BR, Kewen > Richard. > >> >> Segher >> >