On Tue, 9 Jan 2024 06:13:44 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:

>> Yes, IF it is vectorized, then there is no difference between high and low 
>> density. My concern was more if vectorization is preferrable over the scalar 
>> alternative in the low-density case, where branch prediction is more stable.
>
> At runtime we do need to scan entire mask to pick the compressible lane 
> corresponding to set mask bit. Thus the loop overhead of mask compare (BTW 
> masks are held in a vector register for AVX2 targets) and jump will anyways 
> be incurred , in addition for sparsely populated mask we may incur additional 
> misprediction penalty for not taking if block which  extracts an element from 
> appropriate source vector lane and insert into destination vector lane. 
> Overall vector solution will win for most common cases for varying mask and 
> also for very sparsely populate masks.  Here is the result of setting just a 
> single mask bit. 
> 
> 
>     @Benchmark
>     public void fuzzyFilterIntColumn() {
>        int i = 0;
>        int j = 0;
>        long maskctr = 1;
>        int endIndex = ispecies.loopBound(size);
>        for (; i < endIndex; i += ispecies.length()) {
>            IntVector vec = IntVector.fromArray(ispecies, intinCol, i);
>            VectorMask<Integer> pred = VectorMask.fromLong(ispecies, 1);
>            vec.compress(pred).intoArray(intoutCol, j);
>            j += pred.trueCount();
>        }
>    }
> 
> 
> Baseline:
> Benchmark                                                     (size)   Mode  
> Cnt    Score   Error   Units
> ColumnFilterBenchmark.fuzzyFilterIntColumn    1024  thrpt    2  379.059       
>    ops/ms
> ColumnFilterBenchmark.fuzzyFilterIntColumn    2047  thrpt    2  188.355       
>    ops/ms
> ColumnFilterBenchmark.fuzzyFilterIntColumn    4096  thrpt    2   95.315       
>    ops/ms
> 
> 
> Withopt:
> Benchmark                                                     (size)   Mode  
> Cnt     Score   Error   Units
> ColumnFilterBenchmark.fuzzyFilterIntColumn    1024  thrpt    2  7390.074      
>     ops/ms
> ColumnFilterBenchmark.fuzzyFilterIntColumn    2047  thrpt    2  3483.247      
>     ops/ms
> ColumnFilterBenchmark.fuzzyFilterIntColumn    4096  thrpt    2  1823.817      
>     ops/ms

Nice, thanks for the data!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1446138902

Reply via email to