hi folks,

I explored a bit the performance implications of using validity
bitmaps (like the Arrow columnar format) vs. sentinel values (like
NaN, INT32_MIN) for nulls:

http://wesmckinney.com/blog/bitmaps-vs-sentinel-values/

The vectorization results may be of interest to those implementing
analytic functions targeting the Arrow memory format. There's probably
some other optimizations that can be employed, too.

Caveat: it's entirely possible I made some mistakes in my code. I
checked the various implementations for correctness only, and did not
dig too deeply beyond that.

- Wes

Reply via email to