On Thu, Feb 1, 2024 at 1:26 AM Tamar Christina <tamar.christ...@arm.com> wrote: > > Hi All, > > In the vget_set_lane_1.c test the following entries now generate a zip1 > instead of an INS > > BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0) > BUILD_TEST (int32x2_t, int32x2_t, , , s32, 1, 0) > BUILD_TEST (uint32x2_t, uint32x2_t, , , u32, 1, 0) > > This is because the non-Q variant for indices 0 and 1 are just shuffling > values. > There is no perf difference between INS SIMD to SIMD and ZIP, as such just > update the > test file. Hmm, is this true on all cores? I suspect there is a core out there where INS is implemented with a much lower latency than ZIP. If we look at config/aarch64/thunderx.md, we can see INS is 2 cycles while ZIP is 6 cycles (3/7 for q versions). Now I don't have any invested interest in that core any more but I just wanted to point out that is not exactly true for all cores.
> Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? This is PR 112375 by the way. Thanks, Andrew Pinski > > Thanks, > Tamar > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/vget_set_lane_1.c: Update test output. > > --- inline copy of patch -- > diff --git a/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c > b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c > index > 07a77de319206c5c6dad1c0d2d9bcc998583f9c1..a3978f68e4ff5899f395a98615a5e86c3b1389cb > 100644 > --- a/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c > +++ b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c > @@ -22,7 +22,7 @@ BUILD_TEST (uint16x4_t, uint16x4_t, , , u16, 3, 2) > BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0) > BUILD_TEST (int32x2_t, int32x2_t, , , s32, 1, 0) > BUILD_TEST (uint32x2_t, uint32x2_t, , , u32, 1, 0) > -/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[0\\\]" 3 > } } */ > +/* { dg-final { scan-assembler-times "zip1\\tv0.2s, v0.2s, v1.2s" 3 } } */ > > BUILD_TEST (poly8x8_t, poly8x16_t, , q, p8, 7, 15) > BUILD_TEST (int8x8_t, int8x16_t, , q, s8, 7, 15) > > > > > --