On Tue, 9 Jan 2024 06:13:44 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> Yes, IF it is vectorized, then there is no difference between high and low >> density. My concern was more if vectorization is preferrable over the scalar >> alternative in the low-density case, where branch prediction is more stable. > > At runtime we do need to scan entire mask to pick the compressible lane > corresponding to set mask bit. Thus the loop overhead of mask compare (BTW > masks are held in a vector register for AVX2 targets) and jump will anyways > be incurred , in addition for sparsely populated mask we may incur additional > misprediction penalty for not taking if block which extracts an element from > appropriate source vector lane and insert into destination vector lane. > Overall vector solution will win for most common cases for varying mask and > also for very sparsely populate masks. Here is the result of setting just a > single mask bit. > > > @Benchmark > public void fuzzyFilterIntColumn() { > int i = 0; > int j = 0; > long maskctr = 1; > int endIndex = ispecies.loopBound(size); > for (; i < endIndex; i += ispecies.length()) { > IntVector vec = IntVector.fromArray(ispecies, intinCol, i); > VectorMask<Integer> pred = VectorMask.fromLong(ispecies, 1); > vec.compress(pred).intoArray(intoutCol, j); > j += pred.trueCount(); > } > } > > > Baseline: > Benchmark (size) Mode > Cnt Score Error Units > ColumnFilterBenchmark.fuzzyFilterIntColumn 1024 thrpt 2 379.059 > ops/ms > ColumnFilterBenchmark.fuzzyFilterIntColumn 2047 thrpt 2 188.355 > ops/ms > ColumnFilterBenchmark.fuzzyFilterIntColumn 4096 thrpt 2 95.315 > ops/ms > > > Withopt: > Benchmark (size) Mode > Cnt Score Error Units > ColumnFilterBenchmark.fuzzyFilterIntColumn 1024 thrpt 2 7390.074 > ops/ms > ColumnFilterBenchmark.fuzzyFilterIntColumn 2047 thrpt 2 3483.247 > ops/ms > ColumnFilterBenchmark.fuzzyFilterIntColumn 4096 thrpt 2 1823.817 > ops/ms Nice, thanks for the data! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1446138902