hi folks, I explored a bit the performance implications of using validity bitmaps (like the Arrow columnar format) vs. sentinel values (like NaN, INT32_MIN) for nulls:
http://wesmckinney.com/blog/bitmaps-vs-sentinel-values/ The vectorization results may be of interest to those implementing analytic functions targeting the Arrow memory format. There's probably some other optimizations that can be employed, too. Caveat: it's entirely possible I made some mistakes in my code. I checked the various implementations for correctness only, and did not dig too deeply beyond that. - Wes