It should be possible to unroll the sentinel version in many cases. For instance,
sum += (data[i] == SENTINEL) * data[i] This doesn't work with NaN as a sentinel because 0 * NaN => NaN, but it can work with other values. On Tue, Oct 16, 2018 at 9:38 AM Antoine Pitrou <anto...@python.org> wrote: > > Hi Wes, > > Le 16/10/2018 à 14:05, Wes McKinney a écrit : > > hi folks, > > > > I explored a bit the performance implications of using validity > > bitmaps (like the Arrow columnar format) vs. sentinel values (like > > NaN, INT32_MIN) for nulls: > > > > http://wesmckinney.com/blog/bitmaps-vs-sentinel-values/ > > > > The vectorization results may be of interest to those implementing > > analytic functions targeting the Arrow memory format. There's probably > > some other optimizations that can be employed, too. > > This is a nice write-up. It may also possible to further speed up > things using explicit SIMD operations. > > For the non-null case, it should be relatively doable, see e.g. > https://p12tic.github.io/libsimdpp/v2.2-dev/libsimdpp/w/int/reduce_add.html > or > https://p12tic.github.io/libsimdpp/v2.2-dev/libsimdpp/w/fp/reduce_add.html > . > > For the with-nulls case, it might be possible to do something with SIMD > masks, but I'm not competent to propose anything concrete :-) > > Regards > > Antoine. > > > > > > Caveat: it's entirely possible I made some mistakes in my code. I > > checked the various implementations for correctness only, and did not > > dig too deeply beyond that. > > > > - Wes > > >