Hi, I have added AVX512 kernels back in times but they got removed with a reason that I don't agree. (No objections, that was the maintainers decision.)
I am totally in line with you and prefer having feature gated SIMD supersets for the compilation and have compile time selection. I was working on the compile time selection but I dropped the work. E.g. to bring back AVX-512 simple revert would do on the Rust impl. of Arrow. Best, Theo On Sat, Jan 7, 2023, 15:36 Antoine Pitrou <anto...@python.org> wrote: > > Hi, > > For reference, the C++ implementation is compiled by default with SSE4.2 > enabled. We had some rare bug reports of people using very old CPUs > where Arrow C++ would crash (for example for lack of POPCNT instruction, > which is very useful for fast null count computation). > > We also have some dynamic dispatch for select routines where AVX2 or > AVX512 paths are available. > > AVX2 by default is probably too contentious for the time being, IMHO. > > Regards > > Antoine. > > > Le 07/01/2023 à 13:08, Raphael Taylor-Davies a écrit : > > Hi, > > > > It is fairy common to see binaries in the wild making use of the Rust > > arrow libraries compiled with extremely limited SIMD support enabled. As > > I imagine others in the community have run into this before, I thought > > I'd send an email to solicit thoughts. > > > > There are a couple of things that make the Rust implementation > > particularly susceptible to this problem: > > > > - Rust lacks a stable ABI, and so all builds are from source > > - The default x86 release target lacks even SSE3 support (released 2004) > > let alone anything more modern > > - The Rust implementation relies on LLVM to generate vectorised code, > > there are no stable SIMD intrinsics and may never be > > > > My suggestion in [1] is to generate a compilation error if building a > > release binary without SSE3 enabled. This provides a very low barrier to > > entry, and guides users towards the "right thing". In practice I suspect > > most users will be able to add `target-cpu=haswell` and benefit from > > everything up to and including AVX2. > > > > An alternative proposal would be to auto-select from multiple > > implementations at runtime, however, this will effectively multiply > > executable size and compile times, which are already problematic, by > > each combination of features. It is tractable, but I feel optimising for > > a very rare breed of user that is running high-performance CPU workloads > > on a CPU from more than a decade ago... I'm not sure what other people > > think? > > > > Any and all feedback welcome, preferably on the linked issue [1] to keep > > things in one place. > > > > Kind Regards, > > > > Raphael Taylor-Davies > > > > [1]: https://github.com/apache/arrow-rs/issues/3485 > > >