Hi Wes, I think you highlighted the two issues well, but I think they are somewhat orthogonal and runtime dispatching only addresses the binary availability of the optimizations (but actually makes testing harder because it can potentially hide untested code paths).
Personally, I think it is valuable to have SIMD optimization in the code base even if our binaries aren't shipped with them as long as we have sufficient regression testing. For testability, I think there are two issues: A. Resources available to test architecture specific code - To solve this issue I think we choose a "latest" architecture to target. Community members that want to target a more modern architecture than the community agreed upon architecture would have the onus to augment testing resources with that architecture. The recent Big-Endian CI coverage is a good example of this. I don't think it is heavy handed to reject PRs if we don't have sufficient CI coverage. B. Ensuring we have a sufficient test coverage for all code paths. I think this breaks down into how we structure our code. I know I've submitted a recent PR that makes it difficult to test each path separately, I will try to address this before submission. Note, that that structuring the code so that each path can be tested independently is a precursor to runtime dispatch. Once we agree on a "latest" architecture, if the code is structured appropriately, we should get sufficient code coverage by targeting the community decided "latest" architecture for most builds (and not having to do a full matrix of architectural changes). Thanks, Micah On Tue, May 12, 2020 at 6:47 PM Wes McKinney <wesmck...@gmail.com> wrote: > hi, > > We've started to receive a number of patches providing SIMD operations > for both x86 and ARM architectures. Most of these patches make use of > compiler definitions to toggle between code paths at compile time. > > This is problematic for a few reasons: > > * Binaries that are shipped (e.g. in Python) must generally be > compiled for a broad set of supported compilers. That means that AVX2 > / AVX512 optimizations won't be available in these builds for > processors that have them > * Poses a maintainability and testing problem (hard to test every > combination, and it is not practical for local development to compile > every combination, which may cause drawn out test/CI/fix cycles) > > Other projects (e.g. NumPy) have taken the approach of building > binaries that contain multiple variants of a function with different > levels of SIMD, and then choosing at runtime which one to execute > based on what features the CPU supports. This seems like what we > ultimately need to do in Apache Arrow, and if we continue to accept > patches that do not do this, it will be much more work later when we > have to refactor things to runtime dispatching. > > We have some PRs in the queue related to SIMD. Without taking a heavy > handed approach like starting to veto PRs, how would everyone like to > begin to address the runtime dispatching problem? > > Note that the Kernels revamp project I am working on right now will > also facilitate runtime SIMD kernel dispatching for array expression > evaluation. > > Thanks, > Wes >