Richard Biener <rguent...@suse.de> writes: > On Thu, 7 May 2020, Richard Sandiford wrote: > >> Richard Biener <rguent...@suse.de> writes: >> > This implements patterns combining vector element insertion of >> > vector element extraction to a VEC_PERM_EXPR of both vectors >> > when supported. Plus it adds the more generic identity transform >> > of inserting a piece of itself at the same position. >> > >> > Richard - is there anything I can do to make this SVE aware? >> > I'd need to construct an identity permute and "insert" into >> > that permute that element from the other (or same) vector. >> > I suppose for most element positions that won't work but >> > at least inserting at [0] should? I'm mostly struggling >> > on how to use vec_perm_builder here when nelts is not constant, >> > since it's derived from vec<> can I simply start with >> > a single pattern with 1 stride and then insert by using []? >> >> I guess for SVE we still want to know that the range is safe >> for all VL, so after dropping the is_constant check, we'd >> want something like: >> >> { >> poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type); >> unsigned int min_nelts = constant_lower_bound (nelts); >> } >> (if (... >> && at + elemsz <= min_nelts) >> >> In theory (hah) it should then just be a case of changing the >> vec_perm_builder constructor to: >> >> vec_perm_builder sel (nelts, min_nelts, 3); >> >> and then iterating over min_nelts * 3 instead of nelts here: >> >> > + for (unsigned i = 0; i < nelts; ++i) >> > + sel.quick_push (i / elemsz == at >> > + ? nelts + elem * elemsz + i % elemsz : i); >> >> So as far as the encoding goes, the first min_nelts elements are arbitrary >> values, and the following two min_nelts sequences form individual linear >> series. > > OK - not sure why we need exactly three nelts per pattern here.
There are three styles of encoding (see the VECTOR_CST docs in generic.texi for the full gory details): - replicated {a0,...,an} (1 element per pattern) - {a0,...,an} followed by replicated {b0,...,bn} (2 elements per pattern) - {a0,...,an} followed by {b0,...,bn,b0+step0,...,bn+stepn,b0+step0*2,...} (3 elements per pattern) The min_elts check ensures that the difference from the identity permute selector is all in {a0,...,an}. The rest of the vector contains the normal elements for an identity selector and extends for as long as the runtime VL needs it to extend. > It also looks like all the constant_multiple_p () checks constrain > things quite a bit. I don't think it constrains it beyond what we can reasonably do. For SVE this is most likely to be useful when converting between SVE and Advanced SIMD. > Oh, and does a BIT_FIELD_REF with poly-int position > extract multiple elements in the end?! For the case we are extracting > a sub-vector and thus elemsz != 1 we constrain it so that this > sub-vector is not of variable size (err, not "independently" so, > whatever that means..)? No, a poly-int position doesn't change how many elements we extract. It just defers the calculation of the position until runtime. > My brain hurts... how do you write a GIMPLE testcase for aarch64 > SVE covering such cases? Gimple testcase: with difficulty :-) I don't think we have a gimple FE syntax for poly_ints yet. It might be possible to construct a C testcase using intrinsics. I'll give it a go... >> This ought to be work for both SVE and non-SVE, although obviously >> there's a bit of wasted work for non-SVE. >> >> (And thanks for asking :-)) > > So like this, it seems to still work on the x86 testcases? LGTM. I think the elemsz calculate is going to run into the same kind of trouble as PR94980 for AVX/SVE vector booleans, but that shouldn't hold the patch up. Thanks, Richard