(Sorry for the slow reply, was off on Friday) Richard Biener <richard.guent...@gmail.com> writes: > On Wed, May 25, 2022 at 10:24 PM Prathamesh Kulkarni > <prathamesh.kulka...@linaro.org> wrote: >> >> On Thu, 26 May 2022 at 00:37, Richard Biener <richard.guent...@gmail.com> >> wrote: > [...] >> > x86 now accepts V4SI V8SI permutes because we don’t ask it correctly and >> > thus my naive attempt to use the new function API breaks . Not to mention >> > the VEC_PERM IL is still rejected. I will wait for the rest of the series >> > to be approved and pushed. >> Hi, >> I pushed the entire series in ae8decf1d2b8329af59592b4fa78ee8dfab3ba5e >> after it was approved by Richard S. > > Thanks. > > Maybe I'm doing it wrong but I now see > > indices.new_vector (mask, second_vec.first == -1U ? 1 : 2, nunits); > bool identity_p = indices.series_p (0, 1, 0, 1); > > where nunits is 4 and mask {4, 5, 6, 7}, the number of vectors is 1, > and now indices.series_p (0, 1, 0, 1) returns true despite my input > vector having 8 elements and 'indices' should select the upper half. > That's because the function calls clamp() on the encoding but > clamp() knows nothing about the different nunits of the input vectors. > > I suppose vec_perm_indices needs updating to allow for different > nunits of the input vectors as well?
The final argument to new_vector is supposed to be the number of elements per input vector, so it sounds like it should be 8 rather than 4 in this situation. The number of elements per output vector is taken from the mask argument. Thanks, Richard > > Where else does this change need adjustments to other APIs? > > PR101668 has a naiive user of the new capability. The included > testcase works OK but trying to expand test coverage quickly > runs into wrong-code, like for > > typedef int v8si __attribute__((vector_size (32))); > typedef long long v4di __attribute__((vector_size (32))); > > void > bar_s32_s64 (v4di * dst, v8si src) > { > long long tem[8]; > tem[0] = src[4]; > tem[1] = src[5]; > tem[2] = src[6]; > tem[3] = src[7]; > dst[0] = *(v4di *) tem; > } > > which I expected to be rejected with -mavx2. > > Thanks, > Richard. > >> Thanks, >> Prathamesh >> > >> > Richard. >> > >> > > Thanks, >> > > Prathamesh >> > >> >> > >> At least I have a user in the vectorizer ready - allowing more permutes >> > >> from existing vectors (of different sizes now) to be SLP vectorized. >> > >> >> > >> Thanks, >> > >> Richard. >> > >> >> > >>> Thanks, >> > >>> Prathamesh >> > >>>> >> > >>>> Richard