Andrew Pinski <pins...@gmail.com> writes: > On Thu, Feb 1, 2024 at 1:26 AM Tamar Christina <tamar.christ...@arm.com> > wrote: >> >> Hi All, >> >> In the vget_set_lane_1.c test the following entries now generate a zip1 >> instead of an INS >> >> BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0) >> BUILD_TEST (int32x2_t, int32x2_t, , , s32, 1, 0) >> BUILD_TEST (uint32x2_t, uint32x2_t, , , u32, 1, 0) >> >> This is because the non-Q variant for indices 0 and 1 are just shuffling >> values. >> There is no perf difference between INS SIMD to SIMD and ZIP, as such just >> update the >> test file. > Hmm, is this true on all cores? I suspect there is a core out there > where INS is implemented with a much lower latency than ZIP. > If we look at config/aarch64/thunderx.md, we can see INS is 2 cycles > while ZIP is 6 cycles (3/7 for q versions). > Now I don't have any invested interest in that core any more but I > just wanted to point out that is not exactly true for all cores.
Thanks for the pointer. In that case, perhaps we should prefer aarch64_evpc_ins over aarch64_evpc_zip in aarch64_expand_vec_perm_const_1? That's enough to fix this failure, but it'll probably require other tests to be adjusted... Richard