(Sorry for the slow reply, was off on Friday)

Richard Biener <richard.guent...@gmail.com> writes:
> On Wed, May 25, 2022 at 10:24 PM Prathamesh Kulkarni
> <prathamesh.kulka...@linaro.org> wrote:
>>
>> On Thu, 26 May 2022 at 00:37, Richard Biener <richard.guent...@gmail.com> 
>> wrote:
> [...]
>> > x86 now accepts V4SI V8SI permutes because we don’t ask it correctly and 
>> > thus my naive attempt to use the new function API breaks . Not to mention 
>> > the VEC_PERM IL is still rejected. I will wait for the rest of the series 
>> > to be approved and pushed.
>> Hi,
>> I pushed the entire series in ae8decf1d2b8329af59592b4fa78ee8dfab3ba5e
>> after it was approved by Richard S.
>
> Thanks.
>
> Maybe I'm doing it wrong but I now see
>
>           indices.new_vector (mask, second_vec.first == -1U ? 1 : 2, nunits);
>           bool identity_p = indices.series_p (0, 1, 0, 1);
>
> where nunits is 4 and mask {4, 5, 6, 7}, the number of vectors is 1,
> and now indices.series_p (0, 1, 0, 1) returns true despite my input
> vector having 8 elements and 'indices' should select the upper half.
> That's because the function calls clamp() on the encoding but
> clamp() knows nothing about the different nunits of the input vectors.
>
> I suppose vec_perm_indices needs updating to allow for different
> nunits of the input vectors as well?

The final argument to new_vector is supposed to be the number of elements
per input vector, so it sounds like it should be 8 rather than 4 in this
situation.

The number of elements per output vector is taken from the mask argument.

Thanks,
Richard

>
> Where else does this change need adjustments to other APIs?
>
> PR101668 has a naiive user of the new capability.  The included
> testcase works OK but trying to expand test coverage quickly
> runs into wrong-code, like for
>
> typedef int v8si __attribute__((vector_size (32)));
> typedef long long v4di __attribute__((vector_size (32)));
>
> void
> bar_s32_s64 (v4di * dst, v8si src)
> {
>   long long tem[8];
>   tem[0] = src[4];
>   tem[1] = src[5];
>   tem[2] = src[6];
>   tem[3] = src[7];
>   dst[0] = *(v4di *) tem;
> }
>
> which I expected to be rejected with -mavx2.
>
> Thanks,
> Richard.
>
>> Thanks,
>> Prathamesh
>> >
>> > Richard.
>> >
>> > > Thanks,
>> > > Prathamesh
>> > >>
>> > >> At least I have a user in the vectorizer ready - allowing more permutes
>> > >> from existing vectors (of different sizes now) to be SLP vectorized.
>> > >>
>> > >> Thanks,
>> > >> Richard.
>> > >>
>> > >>> Thanks,
>> > >>> Prathamesh
>> > >>>>
>> > >>>> Richard

Reply via email to