Le 16/10/2018 à 18:55, Ted Dunning a écrit : > It should be possible to unroll the sentinel version in many cases. For > instance, > > sum += (data[i] == SENTINEL) * data[i] > > This doesn't work with NaN as a sentinel because 0 * NaN => NaN, but it can > work with other values.
One should also explore whether the multiplication pattern is really optimal. Perhaps a condition would work better. The compiler may be able to turn it into a cheap conditional move (to suppress branch misprediction costs). Regards Antoine. > > > On Tue, Oct 16, 2018 at 9:38 AM Antoine Pitrou <anto...@python.org> wrote: > >> >> Hi Wes, >> >> Le 16/10/2018 à 14:05, Wes McKinney a écrit : >>> hi folks, >>> >>> I explored a bit the performance implications of using validity >>> bitmaps (like the Arrow columnar format) vs. sentinel values (like >>> NaN, INT32_MIN) for nulls: >>> >>> http://wesmckinney.com/blog/bitmaps-vs-sentinel-values/ >>> >>> The vectorization results may be of interest to those implementing >>> analytic functions targeting the Arrow memory format. There's probably >>> some other optimizations that can be employed, too. >> >> This is a nice write-up. It may also possible to further speed up >> things using explicit SIMD operations. >> >> For the non-null case, it should be relatively doable, see e.g. >> https://p12tic.github.io/libsimdpp/v2.2-dev/libsimdpp/w/int/reduce_add.html >> or >> https://p12tic.github.io/libsimdpp/v2.2-dev/libsimdpp/w/fp/reduce_add.html >> . >> >> For the with-nulls case, it might be possible to do something with SIMD >> masks, but I'm not competent to propose anything concrete :-) >> >> Regards >> >> Antoine. >> >> >>> >>> Caveat: it's entirely possible I made some mistakes in my code. I >>> checked the various implementations for correctness only, and did not >>> dig too deeply beyond that. >>> >>> - Wes >>> >> >