On Wed, 20 Aug 2025 10:11:47 GMT, Jatin Bhateja <[email protected]> wrote:
>> Patch optimizes Vector. slice operation with constant index using x86 ALIGNR >> instruction. >> It also adds a new hybrid call generator to facilitate lazy intrinsification >> or else perform procedural inlining to prevent call overhead and boxing >> penalties in case the fallback implementation expects to operate over >> vectors. The existing vector API-based slice implementation is now the >> fallback code that gets inlined in case intrinsification fails. >> >> Idea here is to add infrastructure support to enable intrinsification of >> fast path for selected vector APIs, else enable inlining of fall-back >> implementation if it's based on vector APIs. Existing call generators like >> PredictedCallGenerator, used to handle bi-morphic inlining, already make use >> of multiple call generators to handle hit/miss scenarios for a particular >> receiver type. The newly added hybrid call generator is lazy and called >> during incremental inlining optimization. It also relieves the inline >> expander to handle slow paths, which can easily be implemented library side >> (Java). >> >> Vector API jtreg tests pass at AVX level 2, remaining validation in progress. >> >> Performance numbers: >> >> >> System : 13th Gen Intel(R) Core(TM) i3-1315U >> >> Baseline: >> Benchmark (size) Mode Cnt >> Score Error Units >> VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 >> 9444.444 ops/ms >> VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 >> 10009.319 ops/ms >> VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 >> 9081.926 ops/ms >> VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 >> 6085.825 ops/ms >> VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 >> 6505.378 ops/ms >> VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 >> 6204.489 ops/ms >> VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 >> 1651.334 ops/ms >> VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 >> 1642.784 ops/ms >> VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 >> 1474.808 ops/ms >> VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 >> 10399.394 ops/ms >> VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 >> 10502.894 ops/ms >> VectorSliceB... > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > Update callGenerator.hpp copyright year src/hotspot/share/classfile/vmIntrinsics.hpp line 1178: > 1176: > "Ljdk/internal/vm/vector/VectorSupport$Vector;" > \ > 1177: > "Ljdk/internal/vm/vector/VectorSupport$VectorSliceOp;)" > \ > 1178: > "Ljdk/internal/vm/vector/VectorSupport$Vector;") > \ Seems this `` is not aligned ? src/hotspot/share/classfile/vmIntrinsics.hpp line 1179: > 1177: > "Ljdk/internal/vm/vector/VectorSupport$VectorSliceOp;)" > \ > 1178: > "Ljdk/internal/vm/vector/VectorSupport$Vector;") > \ > 1179: do_name(vector_slice_name, "sliceOp") > \ ditto test/hotspot/jtreg/compiler/vectorapi/TestSliceOptValueTransforms.java line 45: > 43: public static final VectorSpecies<Short> SSP = > ShortVector.SPECIES_PREFERRED; > 44: public static final VectorSpecies<Integer> ISP = > IntVector.SPECIES_PREFERRED; > 45: public static final VectorSpecies<Long> LSP = > LongVector.SPECIES_PREFERRED; The implementation supports floating point types, but why doesn't the test include fp types? test/hotspot/jtreg/compiler/vectorapi/TestSliceOptValueTransforms.java line 122: > 120: .intoArray(bdst, i); > 121: } > 122: } Since this optimization also benefits the slice variant with mask, could you add some tests for it as well? test/micro/org/openjdk/bench/jdk/incubator/vector/VectorSliceBenchmark.java line 59: > 57: static final VectorSpecies<Short> sspecies = > ShortVector.SPECIES_PREFERRED; > 58: static final VectorSpecies<Integer> ispecies = > IntVector.SPECIES_PREFERRED; > 59: static final VectorSpecies<Long> lspecies = > LongVector.SPECIES_PREFERRED; Ditto, no fp types ? test/micro/org/openjdk/bench/jdk/incubator/vector/VectorSliceBenchmark.java line 133: > 131: .intoArray(bdst, i); > 132: } > 133: } Ditto, add a benchmark for the slice variant with mask ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2378092410 PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2378093047 PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2378310217 PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2378337340 PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2378312763 PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2378342519
