On Tue, May 12, 2020 at 10:19 PM Micah Kornfield <emkornfi...@gmail.com> wrote:
>
> Hi Wes,
> I think you highlighted the two issues well, but I think they are somewhat
> orthogonal and runtime dispatching only addresses the binary availability
> of the optimizations (but actually makes testing harder because it can
> potentially hide untested code paths).

Since I develop on an AVX512-capable machine, if we have runtime
dispatching then it should be able to test all variants of a function
from a single executable / test run rather than having to produce
multiple builds and test them separately, right?

Presumably the SIMD-level at runtime would be configurable, so you
could either let it be automatically selected based on your CPU
capabilities or set manually (e.g. if you want to do perf testing with
SIMD vs. no SIMD at runtime).

> Personally, I think it is valuable to have SIMD optimization in the code
> base even if our binaries aren't shipped with them as long as we have
> sufficient regression testing.
>
> For testability, I think there are two issues:
> A.  Resources available to test architecture specific code -  To solve this
> issue I think we choose a "latest" architecture to target.  Community
> members that want to target a more modern architecture than the community
> agreed upon architecture  would have the onus to augment testing resources
> with that architecture.  The recent Big-Endian CI coverage is a good
> example of this.  I don't think it is heavy handed to reject PRs if we
> don't have sufficient CI coverage.
>
> B.  Ensuring we have a sufficient test coverage for all code paths.  I
> think this breaks down into how we structure our code.  I know I've
> submitted a recent PR that makes it difficult to test each path separately,
> I will try to address this before submission.  Note, that that structuring
> the code so that each path can be tested independently is a precursor to
> runtime dispatch.  Once we agree on a "latest" architecture, if the code is
> structured appropriately, we should get sufficient code coverage by
> targeting the community decided "latest" architecture for most builds (and
> not having to do a full matrix of architectural changes).
>
> Thanks,
> Micah
>
>
>
>
>
>
> On Tue, May 12, 2020 at 6:47 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > hi,
> >
> > We've started to receive a number of patches providing SIMD operations
> > for both x86 and ARM architectures. Most of these patches make use of
> > compiler definitions to toggle between code paths at compile time.
> >
> > This is problematic for a few reasons:
> >
> > * Binaries that are shipped (e.g. in Python) must generally be
> > compiled for a broad set of supported compilers. That means that AVX2
> > / AVX512 optimizations won't be available in these builds for
> > processors that have them
> > * Poses a maintainability and testing problem (hard to test every
> > combination, and it is not practical for local development to compile
> > every combination, which may cause drawn out test/CI/fix cycles)
> >
> > Other projects (e.g. NumPy) have taken the approach of building
> > binaries that contain multiple variants of a function with different
> > levels of SIMD, and then choosing at runtime which one to execute
> > based on what features the CPU supports. This seems like what we
> > ultimately need to do in Apache Arrow, and if we continue to accept
> > patches that do not do this, it will be much more work later when we
> > have to refactor things to runtime dispatching.
> >
> > We have some PRs in the queue related to SIMD. Without taking a heavy
> > handed approach like starting to veto PRs, how would everyone like to
> > begin to address the runtime dispatching problem?
> >
> > Note that the Kernels revamp project I am working on right now will
> > also facilitate runtime SIMD kernel dispatching for array expression
> > evaluation.
> >
> > Thanks,
> > Wes
> >

Reply via email to