Le 03/04/2022 à 21:38, Sasha Krassovsky a écrit :

There is concrete proof that autovectorization produces very flimsy results 
(even on the same compiler, simply by varying the datatypes).

As I’ve shown, the Vector-Vector Add kernel example is consistently vectorized 
well across compilers if written in a simple way.

Does it handle a validity bitmap efficiently? Does it handle an entire range of datatypes? Does it handle both array and scalar inputs? If not, how would you propose to handle all these? Chances are, you'll end up rewriting another array of template abstractions.

Until I’ve seen a poorly-vectorized scalar kernel written as a simple for loop, 
I consider these arguments theoretical as well.

This makes little sense. The Arrow C++ codebase is not "theoretical", it's what you are presently working on.

It seems that we’re in agreement at least in terms of concrete action for an 
initial PR: make the kernels system more SIMD-amenable and enable the 
several-times-compilation of source files to at least enable the instruction 
sets. Next, we can evaluate which kernels it’s worth to rewrite in terms of 
xsimd. Does that sound right?

Indeed you can have an initial stab at that.

Regards

Antoine.



Sasha


3 апр. 2022 г., в 11:47, Antoine Pitrou <anto...@python.org> написал(а):

It would be a very significant contributor, as the inconsistency can manifest 
under the form of up to 8-fold differences in performance (or perhaps more).

Reply via email to