On Tue, 4 Apr 2023 13:46:12 GMT, Quan Anh Mai <[email protected]> wrote:
>> `Vector::slice` is a method at the top-level class of the Vector API that
>> concatenates the 2 inputs into an intermediate composite and extracts a
>> window equal to the size of the inputs into the result. It is used in vector
>> conversion methods where the part number is not 0 to slice the parts to the
>> correct positions. Slicing is also used in text processing such as utf8 and
>> utf16 validation. x86 starting from SSSE3 has `palignr` which does vector
>> slicing very efficiently. As a result, I think it is beneficial to add a C2
>> node for this operation as well as intrinsify `Vector::slice` method.
>>
>> A slice is currently implemented as
>> `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires
>> preparation of the index vector and the blending mask. Even with the
>> preparations being hoisted out of the loops, microbenchmarks show
>> improvement using the slice instrinsics. Some have tremendous increases in
>> throughput due to the limitation that a mask of length 2 cannot currently be
>> intrinsified, leading to falling back to the Java implementations.
>>
>> Please take a look and have some reviews. Thank you very much.
>
> Quan Anh Mai has updated the pull request incrementally with one additional
> commit since the last revision:
>
> style
test/hotspot/jtreg/compiler/vectorapi/TestVectorSlice.java line 466:
> 464: @IR(counts = {IRNode.VECTOR_SLICE, "17"})
> 465: static void testB128(byte[][] dst, byte[] src1, byte[] src2) {
> 466: var species = ByteVector.SPECIES_128;
Suggest to define the species as a "`private static final`" field of this test
class. It may make the intrinsification fail if the species is not a constant
to the compiler.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1159206009