*> So rather than using xsimd::batch<uint32_t, 16> for an AVX512 batch,>you would do xsimd::batch<uint32_t, xsimd::arch::avx512> (or e.g.>neon/neon64 for ARM ISAs) and then access the batch size through the>batch::size static property.*
Glad to see xsimd use 'Arch' as the parameter of a 'batch'. For the ARROW-11502 <https://github.com/apache/arrow/pull/9424>, I've submitted several PRs to xsimd to hide arch dependent code in Arrow for avoiding a large maintenance burden. But it was found that it's hard to design an Arch-independent API of a specific feature to cover all different ISAs. Some specific features exist in x86, but do not exist in Arm64 and vice versa. It would take more code maintenance burden to unify these differences. Agree with Yibo to use the new xsimd approach as the dynamic runtime dispatch for each different CPUs support. support level. BRs, Yuqi Yibo Cai <yibo....@arm.com> 于2021年7月19日周一 上午10:55写道: > > > On 7/17/21 12:08 AM, Wes McKinney wrote: > > hi folks, > > > > I had a conversation with the developers of xsimd last week in Paris > > and was made aware that they are working on a substantial refactor of > > xsimd to improve its usability for cross-compilation and > > dynamic-dispatch based on runtime processor capabilities. The branch > > with the refactor is located here: > > > > https://github.com/xtensor-stack/xsimd/tree/feature/xsimd-refactoring > > > > In particular, the simd batch API is changing from > > > > template <class T, size_t N> > > class batch; > > > > to > > > > template <class T, class arch> > > class batch; > > > > So rather than using xsimd::batch<uint32_t, 16> for an AVX512 batch, > > you would do xsimd::batch<uint32_t, xsimd::arch::avx512> (or e.g. > > neon/neon64 for ARM ISAs) and then access the batch size through the > > batch::size static property. > > Adding this 'arch' parameter is a bit strange at first glance, given the > purpose of an simd wrapper is to hide arch dependent code. > But as latest simd isa (sve, avx512) has much richer features than > simply widening the data width, looks arch code is a must. > I think this change won't cause trouble to existing xsimd client code. > > > > > A few comments for discussion / investigation: > > > > * Firstly, we will have to prepare ourselves to migrate to this new > > API in the future > > > > * At some point, we will likely want to generate SIMD-variants of our > > C++ math kernels usable via dynamic dispatch for each different CPU > > support level. It would be beneficial to author as much code in an > > ISA-independent fashion that can be cross-compiled to generate binary > > code for each ISA. We should investigate whether the new approach in > > xsimd will provide what we need or if we need to take a different > > approach. > > > > * We have some of our own dynamic dispatch code to enable runtime > > function pointer selection based on available SIMD levels. Can we > > benefit from any of the work that is happening in this xsimd refactor? > > I think they have some overlaps. Runtime dispatch at xsimd level(simd > code block) looks better than at kernel dispatch level, IIUC. > > > > > * We have some compute code (e.g. hash tables for aggregation / joins) > > that uses explicit AVX2 intrinsics — can some of this code be ported > > to use generic xsimd APIs or will we need to use a different > > fundamental algorithm design to yield maximum efficiency for each SIMD > > ISA? > > > > Thanks, > > Wes > > >