xsimd has three problems I can think of right now: 1) xsimd code looks like normal simd code: you have to explicitly do loads and stores, you have to explicitly unroll and stride through your loop, and you have to explicitly process the tail of the loop. This makes writing a large number of kernels extremely tedious and error-prone. In comparison, writing a single three-line scalar for loop is easier to both read and write. 2) xsimd limits the freedom an optimizer has to select instructions and do other optimizations, as it's just a thin wrapper over normal intrinsics. One concrete example would be if we wanted to take advantage of the dynamic dispatch instruction set xsimd offers, the loop strides would no longer be compile-time constants, which might prevent the compiler from performing loop unrolling (how would it know that the stride isn't just 1?) 3) Lastly, if we ever want to support a new architecture (like Power9 or RISC-V), we'd have to wait for an xsimd backend to become available. On the other hand, if SiFive came out with a hot new chip supporting RV64V, all we'd have to do to support it is to add the appropriate compiler flag into the CMakeLists.
As for using an external build system, I'm not sure how much complexity it would add, but at the very least I suspect it would work out of the box if you only wanted to support scalar kernels. Otherwise I don't think it would add much more complexity than we currently have detecting architectures at buildtime. Sasha On Tue, Mar 29, 2022 at 3:26 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > Hi Sasha, > Could you elaborate on the problems of the XSIMD dependency? What you > describe sounds a lot like what XSIMD provides in a prepackaged form and > without the extra CMake magic. > > I have to occasionally build Arrow with an external build system and it > sounds like this type of logic could add complexity there. > > Thanks, > Micah > > On Tue, Mar 29, 2022 at 3:14 PM Sasha Krassovsky < > krassovskysa...@gmail.com> > wrote: > > > Hi everyone, > > I've noticed that we include xsimd as an abstraction over all of the simd > > architectures. I'd like to propose a different solution which would > result > > in fewer lines of code, while being more readable. > > > > My thinking is that anything simple enough to abstract with xsimd can be > > autovectorized by the compiler. Any more interesting SIMD algorithm > usually > > is tailored to the target instruction set and can't be abstracted away > with > > xsimd anyway. > > > > With that in mind, I'd like to propose the following strategy: > > 1. Write a single source file with simple, element-at-a-time for loop > > implementations of each function. > > 2. Compile this same source file several times with different compile > flags > > for different vectorization (e.g. if we're on an x86 machine that > supports > > AVX2 and AVX512, we'd compile once with -mavx2 and once with -mavx512vl). > > 3. Functions compiled with different instruction sets can be > differentiated > > by a namespace, which gets defined during the compiler invocation. For > > example, for AVX2 we'd invoke the compiler with -DNAMESPACE=AVX2 and then > > for something like elementwise addition of two arrays, we'd call > > arrow::AVX2::VectorAdd. > > > > I believe this would let us remove xsimd as a dependency while also > giving > > us lots of vectorized kernels at the cost of some extra cmake magic. > After > > that, it would just be a matter of making the function registry point to > > these new functions. > > > > Please let me know your thoughts! > > > > Thanks, > > Sasha Krassovsky > > >