Richard Biener <rguent...@suse.de> writes:
> On Thu, 7 May 2020, Richard Sandiford wrote:
>
>> Richard Biener <rguent...@suse.de> writes:
>> > This implements patterns combining vector element insertion of
>> > vector element extraction to a VEC_PERM_EXPR of both vectors
>> > when supported.  Plus it adds the more generic identity transform
>> > of inserting a piece of itself at the same position.
>> >
>> > Richard - is there anything I can do to make this SVE aware?
>> > I'd need to construct an identity permute and "insert" into
>> > that permute that element from the other (or same) vector.
>> > I suppose for most element positions that won't work but
>> > at least inserting at [0] should?  I'm mostly struggling
>> > on how to use vec_perm_builder here when nelts is not constant,
>> > since it's derived from vec<> can I simply start with
>> > a single pattern with 1 stride and then insert by using []?
>> 
>> I guess for SVE we still want to know that the range is safe
>> for all VL, so after dropping the is_constant check, we'd
>> want something like:
>> 
>>    {
>>      poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type);
>>      unsigned int min_nelts = constant_lower_bound (nelts);
>>    }
>>    (if (...
>>         && at + elemsz <= min_nelts)
>> 
>> In theory (hah) it should then just be a case of changing the
>> vec_perm_builder constructor to:
>> 
>>           vec_perm_builder sel (nelts, min_nelts, 3);
>> 
>> and then iterating over min_nelts * 3 instead of nelts here:
>> 
>> > +       for (unsigned i = 0; i < nelts; ++i)
>> > +         sel.quick_push (i / elemsz == at
>> > +                   ? nelts + elem * elemsz + i % elemsz : i);
>> 
>> So as far as the encoding goes, the first min_nelts elements are arbitrary
>> values, and the following two min_nelts sequences form individual linear
>> series.
>
> OK - not sure why we need exactly three nelts per pattern here.

There are three styles of encoding (see the VECTOR_CST docs in
generic.texi for the full gory details):

- replicated {a0,...,an} (1 element per pattern)

- {a0,...,an} followed by replicated {b0,...,bn} (2 elements per pattern)

- {a0,...,an} followed by {b0,...,bn,b0+step0,...,bn+stepn,b0+step0*2,...}
  (3 elements per pattern)

The min_elts check ensures that the difference from the identity permute
selector is all in {a0,...,an}.  The rest of the vector contains the normal
elements for an identity selector and extends for as long as the runtime
VL needs it to extend.

> It also looks like all the constant_multiple_p () checks constrain
> things quite a bit.

I don't think it constrains it beyond what we can reasonably do.
For SVE this is most likely to be useful when converting between
SVE and Advanced SIMD.

> Oh, and does a BIT_FIELD_REF with poly-int position
> extract multiple elements in the end?!  For the case we are extracting
> a sub-vector and thus elemsz != 1 we constrain it so that this
> sub-vector is not of variable size (err, not "independently" so,
> whatever that means..)?

No, a poly-int position doesn't change how many elements we extract.
It just defers the calculation of the position until runtime.

> My brain hurts...  how do you write a GIMPLE testcase for aarch64
> SVE covering such cases?

Gimple testcase: with difficulty :-)  I don't think we have a
gimple FE syntax for poly_ints yet.

It might be possible to construct a C testcase using intrinsics.
I'll give it a go...

>> This ought to be work for both SVE and non-SVE, although obviously
>> there's a bit of wasted work for non-SVE.
>> 
>> (And thanks for asking :-))
>
> So like this, it seems to still work on the x86 testcases?

LGTM.  I think the elemsz calculate is going to run into the same
kind of trouble as PR94980 for AVX/SVE vector booleans, but that
shouldn't hold the patch up.

Thanks,
Richard

Reply via email to