On Wed, 20 Aug 2025 10:11:47 GMT, Jatin Bhateja <[email protected]> wrote:
>> Patch optimizes Vector. slice operation with constant index using x86 ALIGNR
>> instruction.
>> It also adds a new hybrid call generator to facilitate lazy intrinsification
>> or else perform procedural inlining to prevent call overhead and boxing
>> penalties in case the fallback implementation expects to operate over
>> vectors. The existing vector API-based slice implementation is now the
>> fallback code that gets inlined in case intrinsification fails.
>>
>> Idea here is to add infrastructure support to enable intrinsification of
>> fast path for selected vector APIs, else enable inlining of fall-back
>> implementation if it's based on vector APIs. Existing call generators like
>> PredictedCallGenerator, used to handle bi-morphic inlining, already make use
>> of multiple call generators to handle hit/miss scenarios for a particular
>> receiver type. The newly added hybrid call generator is lazy and called
>> during incremental inlining optimization. It also relieves the inline
>> expander to handle slow paths, which can easily be implemented library side
>> (Java).
>>
>> Vector API jtreg tests pass at AVX level 2, remaining validation in progress.
>>
>> Performance numbers:
>>
>>
>> System : 13th Gen Intel(R) Core(TM) i3-1315U
>>
>> Baseline:
>> Benchmark (size) Mode Cnt
>> Score Error Units
>> VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2
>> 9444.444 ops/ms
>> VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2
>> 10009.319 ops/ms
>> VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2
>> 9081.926 ops/ms
>> VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2
>> 6085.825 ops/ms
>> VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2
>> 6505.378 ops/ms
>> VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2
>> 6204.489 ops/ms
>> VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2
>> 1651.334 ops/ms
>> VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2
>> 1642.784 ops/ms
>> VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2
>> 1474.808 ops/ms
>> VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2
>> 10399.394 ops/ms
>> VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2
>> 10502.894 ops/ms
>> VectorSliceB...
>
> Jatin Bhateja has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Update callGenerator.hpp copyright year
test/hotspot/jtreg/compiler/vectorapi/TestSliceOptValueTransforms.java line 101:
> 99: .slice(0, ByteVector.fromArray(BSP, bsrc2, i))
> 100: .intoArray(bdst, i);
> 101: }
Would you mind adding a correctness check for these tests, for byte type, like:
@DontInline
static void verifyVectorSliceByte(int origin) {
for (int i = 0; i < BSP.loopBound(SIZE); i += BSP.length()) {
int index = i;
for (int j = i + origin; j < i + BSP.length(); j++) {
Asserts.assertEquals(bsrc1[j], bdst[index++]);
}
for (int j = i; j < i + origin; j++) {
Asserts.assertEquals(bsrc2[j], bdst[index++]);
}
}
}
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/24104#discussion_r2386593970