Hi everyone,
I've noticed that we include xsimd as an abstraction over all of the simd
architectures. I'd like to propose a different solution which would result
in fewer lines of code, while being more readable.

My thinking is that anything simple enough to abstract with xsimd can be
autovectorized by the compiler. Any more interesting SIMD algorithm usually
is tailored to the target instruction set and can't be abstracted away with
xsimd anyway.

With that in mind, I'd like to propose the following strategy:
1. Write a single source file with simple, element-at-a-time for loop
implementations of each function.
2. Compile this same source file several times with different compile flags
for different vectorization (e.g. if we're on an x86 machine that supports
AVX2 and AVX512, we'd compile once with -mavx2 and once with -mavx512vl).
3. Functions compiled with different instruction sets can be differentiated
by a namespace, which gets defined during the compiler invocation. For
example, for AVX2 we'd invoke the compiler with -DNAMESPACE=AVX2 and then
for something like elementwise addition of two arrays, we'd call
arrow::AVX2::VectorAdd.

I believe this would let us remove xsimd as a dependency while also giving
us lots of vectorized kernels at the cost of some extra cmake magic. After
that, it would just be a matter of making the function registry point to
these new functions.

Please let me know your thoughts!

Thanks,
Sasha Krassovsky

Reply via email to