Re: Algorithmic explorations of bitmaps vs. sentinel values

2018-10-17 Thread Ted Dunning
Memory binding can be viewed as opportunity for melding multiple aggregators. For instance, any additional aggregation comes nearly for free. Sum and count (non zero) will be the same as either alone. Or sum and sum of squares. On Wed, Oct 17, 2018, 06:21 Francois Saint-Jacques < fsaintjacq..

Re: Algorithmic explorations of bitmaps vs. sentinel values

2018-10-17 Thread Francois Saint-Jacques
Don't trust that the compiler will auto-vectorize, tiny changes can have drastic impacts. SumScalar (benchmark-native): │ state->total += *values++; 42 │ add(%rcx),%esi 68 │ mov%esi,(%rsp) 3 │ add0x4(%rcx),%esi 36 │ mov%esi,(%rsp) │ ad

Re: Algorithmic explorations of bitmaps vs. sentinel values

2018-10-17 Thread Wes McKinney
I'm surprised that using a stack variable has an impact, but I should probably update the benchmarks to do that (and merge with the SumState at the end of the function) for thoroughness. Thanks! On Wed, Oct 17, 2018 at 9:07 AM Francois Saint-Jacques wrote: > > It seems the code for the naive Scala

Re: Algorithmic explorations of bitmaps vs. sentinel values

2018-10-17 Thread Francois Saint-Jacques
It seems the code for the naive Scalar example is not friendly with the compiler auto-vectorization component. If you accumulate in a local state (instead of SumState pointer), you'll get different results. at least with clang++6.0. benchmark-noavx (only SSE): BM_SumInt32Scalar

Re: Algorithmic explorations of bitmaps vs. sentinel values

2018-10-16 Thread Antoine Pitrou
Le 16/10/2018 à 18:55, Ted Dunning a écrit : > It should be possible to unroll the sentinel version in many cases. For > instance, > > sum += (data[i] == SENTINEL) * data[i] > > This doesn't work with NaN as a sentinel because 0 * NaN => NaN, but it can > work with other values. One shoul

Re: Algorithmic explorations of bitmaps vs. sentinel values

2018-10-16 Thread Ted Dunning
It should be possible to unroll the sentinel version in many cases. For instance, sum += (data[i] == SENTINEL) * data[i] This doesn't work with NaN as a sentinel because 0 * NaN => NaN, but it can work with other values. On Tue, Oct 16, 2018 at 9:38 AM Antoine Pitrou wrote: > > Hi Wes, >

Re: Algorithmic explorations of bitmaps vs. sentinel values

2018-10-16 Thread Antoine Pitrou
Hi Wes, Le 16/10/2018 à 14:05, Wes McKinney a écrit : > hi folks, > > I explored a bit the performance implications of using validity > bitmaps (like the Arrow columnar format) vs. sentinel values (like > NaN, INT32_MIN) for nulls: > > http://wesmckinney.com/blog/bitmaps-vs-sentinel-values/ >