Hi Wes,
I think you highlighted the two issues well, but I think they are somewhat
orthogonal and runtime dispatching only addresses the binary availability
of the optimizations (but actually makes testing harder because it can
potentially hide untested code paths).

Personally, I think it is valuable to have SIMD optimization in the code
base even if our binaries aren't shipped with them as long as we have
sufficient regression testing.

For testability, I think there are two issues:
A.  Resources available to test architecture specific code -  To solve this
issue I think we choose a "latest" architecture to target.  Community
members that want to target a more modern architecture than the community
agreed upon architecture  would have the onus to augment testing resources
with that architecture.  The recent Big-Endian CI coverage is a good
example of this.  I don't think it is heavy handed to reject PRs if we
don't have sufficient CI coverage.

B.  Ensuring we have a sufficient test coverage for all code paths.  I
think this breaks down into how we structure our code.  I know I've
submitted a recent PR that makes it difficult to test each path separately,
I will try to address this before submission.  Note, that that structuring
the code so that each path can be tested independently is a precursor to
runtime dispatch.  Once we agree on a "latest" architecture, if the code is
structured appropriately, we should get sufficient code coverage by
targeting the community decided "latest" architecture for most builds (and
not having to do a full matrix of architectural changes).

Thanks,
Micah






On Tue, May 12, 2020 at 6:47 PM Wes McKinney <wesmck...@gmail.com> wrote:

> hi,
>
> We've started to receive a number of patches providing SIMD operations
> for both x86 and ARM architectures. Most of these patches make use of
> compiler definitions to toggle between code paths at compile time.
>
> This is problematic for a few reasons:
>
> * Binaries that are shipped (e.g. in Python) must generally be
> compiled for a broad set of supported compilers. That means that AVX2
> / AVX512 optimizations won't be available in these builds for
> processors that have them
> * Poses a maintainability and testing problem (hard to test every
> combination, and it is not practical for local development to compile
> every combination, which may cause drawn out test/CI/fix cycles)
>
> Other projects (e.g. NumPy) have taken the approach of building
> binaries that contain multiple variants of a function with different
> levels of SIMD, and then choosing at runtime which one to execute
> based on what features the CPU supports. This seems like what we
> ultimately need to do in Apache Arrow, and if we continue to accept
> patches that do not do this, it will be much more work later when we
> have to refactor things to runtime dispatching.
>
> We have some PRs in the queue related to SIMD. Without taking a heavy
> handed approach like starting to veto PRs, how would everyone like to
> begin to address the runtime dispatching problem?
>
> Note that the Kernels revamp project I am working on right now will
> also facilitate runtime SIMD kernel dispatching for array expression
> evaluation.
>
> Thanks,
> Wes
>

Reply via email to