hi, We've started to receive a number of patches providing SIMD operations for both x86 and ARM architectures. Most of these patches make use of compiler definitions to toggle between code paths at compile time.
This is problematic for a few reasons: * Binaries that are shipped (e.g. in Python) must generally be compiled for a broad set of supported compilers. That means that AVX2 / AVX512 optimizations won't be available in these builds for processors that have them * Poses a maintainability and testing problem (hard to test every combination, and it is not practical for local development to compile every combination, which may cause drawn out test/CI/fix cycles) Other projects (e.g. NumPy) have taken the approach of building binaries that contain multiple variants of a function with different levels of SIMD, and then choosing at runtime which one to execute based on what features the CPU supports. This seems like what we ultimately need to do in Apache Arrow, and if we continue to accept patches that do not do this, it will be much more work later when we have to refactor things to runtime dispatching. We have some PRs in the queue related to SIMD. Without taking a heavy handed approach like starting to veto PRs, how would everyone like to begin to address the runtime dispatching problem? Note that the Kernels revamp project I am working on right now will also facilitate runtime SIMD kernel dispatching for array expression evaluation. Thanks, Wes