I agree that a potential inconsistent experience is a problem, but I
disagree that SIMD would be the root of the problem, or even be a
significant contributor to it.
The problem is essentially: "How can we be sure that all compilers will
generate good code on all platforms?" As you said, we have a lot of
platforms so that's not really practical.
I think that since we already try to use autovectorization in the kernels
subsystem, none of these problems are new. By not enabling AVX2 when it
would be simple enough to do so is akin to disabling compiler optimizations
because they may make the program better for some people. On the other
hand, rewriting everything to be explicitly vectorized is also much more
work than just enabling more instruction sets for autovectorization. And
lastly, who can say that xsimd will be compiled properly and perform better
than autovectorization?

So overall, no matter what, we'll have to rewrite the kernel system to be
more SIMD amenable, and enable the relevant instruction sets in the build
system. I don't see writing everything without xsimd introducing any
problems that wouldn't exist with xsimd. I would be fine keeping xsimd
around to give us opportunities to further tune performance. At the very
least, for an initial PR, I would like to keep everything simpler. We can
then evaluate xsimd-fying the kernels separately.

Sasha

On Thu, Mar 31, 2022 at 12:36 AM Antoine Pitrou <anto...@python.org> wrote:

>
> Le 31/03/2022 à 09:19, Sasha Krassovsky a écrit :
> >> As I showed, those auto-vectorized kernels may be vectorized only in
> some situations, depending on the compiler version, the input datatypes...
> >
> > I would more than anything interpret the fact that that code was
> vectorized at all as an amazing win for compiler technology, as it’s a very
> abstract way of gluing together different pieces of code using templates
> and lambda expressions.
>
> That's a possible interpretation, but it doesn't really help the bottom
> line :-)
>
> > A lot of the kernels that we would be writing are probably basic unit
> tests [1] for the compiler’s vectorizer, and I’ve hopefully shown that even
> very old versions do just fine.
> >
> > Anyway, in the worst case we will eventually write every kernel with
> xsimd, and have the autovectorized kernels temporarily there. If we find
> that performance is good on our platforms, then we can skip the “rewrite in
> xsimd” step.
>
> "Our platforms" are rather broad however. We have binary packages for
> Windows, macOS, Linux, using several compilers and toolchains (because
> there are R packages, Python packages and sometimes C++ packages). For
> example, on Windows the R packages are built with different versions of
> MinGW/gcc depending on the R version, while the Python packages are
> built with some version of MSVC (which might be of a different version
> depending on whether it's a conda package or a Python wheel, I'm not sure).
>
> And there are of course the different architectures: we support x86 and
> arm64 for both macOS and Linux, for example; we might even have ppc64
> packages of some sort (?).
>
> Regards
>
> Antoine.
>

Reply via email to