Memory binding can be viewed as opportunity for melding multiple
aggregators.
For instance, any additional aggregation comes nearly for free. Sum and
count (non zero) will be the same as either alone. Or sum and sum of
squares.
On Wed, Oct 17, 2018, 06:21 Francois Saint-Jacques <
fsaintjacq..
Don't trust that the compiler will auto-vectorize, tiny changes can have
drastic impacts.
SumScalar (benchmark-native):
│ state->total += *values++;
42 │ add(%rcx),%esi
68 │ mov%esi,(%rsp)
3 │ add0x4(%rcx),%esi
36 │ mov%esi,(%rsp)
│ ad
I'm surprised that using a stack variable has an impact, but I should
probably update the benchmarks to do that (and merge with the
SumState at the end of the function) for thoroughness. Thanks!
On Wed, Oct 17, 2018 at 9:07 AM Francois Saint-Jacques
wrote:
>
> It seems the code for the naive Scala
It seems the code for the naive Scalar example is not friendly with the
compiler auto-vectorization component. If you accumulate in a local state
(instead of SumState pointer), you'll get different results. at least with
clang++6.0.
benchmark-noavx (only SSE):
BM_SumInt32Scalar
Le 16/10/2018 à 18:55, Ted Dunning a écrit :
> It should be possible to unroll the sentinel version in many cases. For
> instance,
>
> sum += (data[i] == SENTINEL) * data[i]
>
> This doesn't work with NaN as a sentinel because 0 * NaN => NaN, but it can
> work with other values.
One shoul
It should be possible to unroll the sentinel version in many cases. For
instance,
sum += (data[i] == SENTINEL) * data[i]
This doesn't work with NaN as a sentinel because 0 * NaN => NaN, but it can
work with other values.
On Tue, Oct 16, 2018 at 9:38 AM Antoine Pitrou wrote:
>
> Hi Wes,
>
Hi Wes,
Le 16/10/2018 à 14:05, Wes McKinney a écrit :
> hi folks,
>
> I explored a bit the performance implications of using validity
> bitmaps (like the Arrow columnar format) vs. sentinel values (like
> NaN, INT32_MIN) for nulls:
>
> http://wesmckinney.com/blog/bitmaps-vs-sentinel-values/
>