Andrew Pinski <pins...@gmail.com> writes:
> On Thu, Feb 1, 2024 at 1:26 AM Tamar Christina <tamar.christ...@arm.com> 
> wrote:
>>
>> Hi All,
>>
>> In the vget_set_lane_1.c test the following entries now generate a zip1 
>> instead of an INS
>>
>> BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0)
>> BUILD_TEST (int32x2_t,   int32x2_t,   , , s32, 1, 0)
>> BUILD_TEST (uint32x2_t,  uint32x2_t,  , , u32, 1, 0)
>>
>> This is because the non-Q variant for indices 0 and 1 are just shuffling 
>> values.
>> There is no perf difference between INS SIMD to SIMD and ZIP, as such just 
>> update the
>> test file.
> Hmm, is this true on all cores? I suspect there is a core out there
> where INS is implemented with a much lower latency than ZIP.
> If we look at config/aarch64/thunderx.md, we can see INS is 2 cycles
> while ZIP is 6 cycles (3/7 for q versions).
> Now I don't have any invested interest in that core any more but I
> just wanted to point out that is not exactly true for all cores.

Thanks for the pointer.  In that case, perhaps we should prefer
aarch64_evpc_ins over aarch64_evpc_zip in aarch64_expand_vec_perm_const_1?
That's enough to fix this failure, but it'll probably require other
tests to be adjusted...

Richard

Reply via email to