Hi,

I have added AVX512 kernels back in times but they got removed with a
reason that I don't agree. (No objections, that was the maintainers
decision.)

I am totally in line with you and prefer having feature gated SIMD
supersets for the compilation and have compile time selection. I was
working on the compile time selection but I dropped the work.

E.g. to bring back AVX-512 simple revert would do on the Rust impl. of
Arrow.

Best,
Theo


On Sat, Jan 7, 2023, 15:36 Antoine Pitrou <anto...@python.org> wrote:

>
> Hi,
>
> For reference, the C++ implementation is compiled by default with SSE4.2
> enabled. We had some rare bug reports of people using very old CPUs
> where Arrow C++ would crash (for example for lack of POPCNT instruction,
> which is very useful for fast null count computation).
>
> We also have some dynamic dispatch for select routines where AVX2 or
> AVX512 paths are available.
>
> AVX2 by default is probably too contentious for the time being, IMHO.
>
> Regards
>
> Antoine.
>
>
> Le 07/01/2023 à 13:08, Raphael Taylor-Davies a écrit :
> > Hi,
> >
> > It is fairy common to see binaries in the wild making use of the Rust
> > arrow libraries compiled with extremely limited SIMD support enabled. As
> > I imagine others in the community have run into this before, I thought
> > I'd send an email to solicit thoughts.
> >
> > There are a couple of things that make the Rust implementation
> > particularly susceptible to this problem:
> >
> > - Rust lacks a stable ABI, and so all builds are from source
> > - The default x86 release target lacks even SSE3 support (released 2004)
> > let alone anything more modern
> > - The Rust implementation relies on LLVM to generate vectorised code,
> > there are no stable SIMD intrinsics and may never be
> >
> > My suggestion in [1] is to generate a compilation error if building a
> > release binary without SSE3 enabled. This provides a very low barrier to
> > entry, and guides users towards the "right thing". In practice I suspect
> > most users will be able to add `target-cpu=haswell` and benefit from
> > everything up to and including AVX2.
> >
> > An alternative proposal would be to auto-select from multiple
> > implementations at runtime, however, this will effectively multiply
> > executable size and compile times, which are already problematic, by
> > each combination of features. It is tractable, but I feel optimising for
> > a very rare breed of user that is running high-performance CPU workloads
> > on a CPU from more than a decade ago... I'm not sure what other people
> > think?
> >
> > Any and all feedback welcome, preferably on the linked issue [1] to keep
> > things in one place.
> >
> > Kind Regards,
> >
> > Raphael Taylor-Davies
> >
> > [1]: https://github.com/apache/arrow-rs/issues/3485
> >
>

Reply via email to