Hi, Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. These are very frequently used operation in columnar database filter operation.
Implementation uses a lookup table to record permute indices. Table index is computed using mask argument of compress/expand operation. Following are the performance number of JMH micro included with the patch. System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) Baseline: Benchmark (size) Mode Cnt Score Error Units ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms Withopt: Benchmark (size) Mode Cnt Score Error Units ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 1791.170 ops/ms ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 974.888 ops/ms ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 1128.281 ops/ms ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 686.334 ops/ms ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 337.170 ops/ms Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. Changes: https://git.openjdk.org/jdk/pull/17261/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322768 Stats: 336 lines in 10 files changed: 323 ins; 8 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/17261.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17261/head:pull/17261 PR: https://git.openjdk.org/jdk/pull/17261