This topic was talked in an earlier thread [1], but not landed yet.
PR https://github.com/apache/arrow/pull/9424 optimizes ByteStreamSplit with Arm64 NEON, maybe it's a good chance to evaluate possibility of simplifying arch dependent SIMD code with an SIMD library. I did a quick comparison of four open source SIMD libraries we've mentioned in earlier talks. All libraries support C++11, GCC/Clang/MSVC, x86 (up to AVX512), Arm64 NEON, with permissive open source licenses. Some differences: - nsimd: https://github.com/agenium-scale/nsimd * supports Arm64 SVE, Cuda * needs installation, not header only * 133 stars, 11 contributors - mipp: https://github.com/aff3ct/MIPP * header only * 241 stars, 2 contributors - xsimd: https://github.com/xtensor-stack/xsimd * header only * 938 stars, 28 contributors - libsimdpp: https://github.com/p12tic/libsimdpp * supports PPC, MIPS * header only * 906 stars, 17 contributors I have a little experience of libsimdpp. It's straightforward to use. I suppose other libraries are similar. I would prefer xsimd. Simply because it has more stars and contributors, and a more active community. [1] https://mail-archives.apache.org/mod_mbox/arrow-dev/202006.mbox/%3C3667345c-fdd2-5bbd-9bff-023282c377d8%40python.org%3E