On Mon, 8 Jun 2026 08:13:18 GMT, Fei Gao <[email protected]> wrote: >>> @fg1417 Nice progress, I had some responses and new comments above. Main >>> new idea: what about Vector API vectors that create these patterns, do they >>> also get optimized by your changes now? >> >> Hi @eme64, >> Thanks for your reviewing! >> I've already created the Vector API benchmark locally, but I'm currently >> waiting for access to testing resources. Sorry for the delay, and thanks for >> your patience. > >> @fg1417 Nice progress, I had some responses and new comments above. Main new >> idea: what about Vector API vectors that create these patterns, do they also >> get optimized by your changes now? > > Hi @eme64, thanks for your patience. > > I’ve pushed the Vector API microbenchmarks in > `test/micro/org/openjdk/bench/jdk/incubator/vector/LongVectorReduction.java` > that mirror the auto-vectorization patterns, along with the corresponding IR > test cases. The change also benefits these Vector API microbenchmarks. > > On an `Arm Neoverse V2` platform, I observed the following results: > > > Benchmark (size) Mode Cnt Units > uplift > LongVectorReduction.addBig 512 thrpt 5 ops/ms > 2.97% > LongVectorReduction.addBig 2048 thrpt 5 ops/ms > 0.37% > LongVectorReduction.addDotProduct 512 thrpt 5 ops/ms > 50.99% > LongVectorReduction.addDotProduct 2048 thrpt 5 ops/ms > 49.95% > LongVectorReduction.addDotProductShared 512 thrpt 5 ops/ms > 0.29% > LongVectorReduction.addDotProductShared 2048 thrpt 5 ops/ms > -0.01% > LongVectorReduction.ifElsePhiAdd 512 thrpt 5 ops/ms > 8.50% > LongVectorReduction.ifElsePhiAdd 2048 thrpt 5 ops/ms > 16.04% > LongVectorReduction.ifElsePhiSub 512 thrpt 5 ops/ms > 10.55% > LongVectorReduction.ifElsePhiSub 2048 thrpt 5 ops/ms > 11.78% > LongVectorReduction.subDotProduct 512 thrpt 5 ops/ms > 50.74% > LongVectorReduction.subDotProduct 2048 thrpt 5 ops/ms > 50.49% > > > Thanks!
> @fg1417 Thanks for the updates and benchmarks! I think the code is > reasonable. I gave the PR another scan :) @eme64 Thanks for the review! I’ve now extended the patch to cover masked operations as well, and added the corresponding IR test cases and microbenchmarks in the latest commit. On an `Arm Neoverse V2` system, I observed the following improvements: Benchmark (size) Mode Cnt Units Uplift LongVectorReduction.addDotProductMasked 512 thrpt 5 ops/ms 49.65% LongVectorReduction.addDotProductMasked 2048 thrpt 5 ops/ms 50.11% LongVectorReduction.subDotProductMasked 512 thrpt 5 ops/ms 50.47% LongVectorReduction.subDotProductMasked 2048 thrpt 5 ops/ms 49.67% Please let me know if you have any comments or further suggestions. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/30237#issuecomment-4680462400
