On Tue, 7 Mar 2023 18:23:42 GMT, Quan Anh Mai <qa...@openjdk.org> wrote:
> `Vector::slice` is a method at the top-level class of the Vector API that > concatenates the 2 inputs into an intermediate composite and extracts a > window equal to the size of the inputs into the result. It is used in vector > conversion methods where the part number is not 0 to slice the parts to the > correct positions. Slicing is also used in text processing such as utf8 and > utf16 validation. x86 starting from SSSE3 has `palignr` which does vector > slicing very efficiently. As a result, I think it is beneficial to add a C2 > node for this operation as well as intrinsify `Vector::slice` method. > > A slice is currently implemented as > `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires > preparation of the index vector and the blending mask. Even with the > preparations being hoisted out of the loops, microbenchmarks show improvement > using the slice instrinsics. Some have tremendous increases in throughput due > to the limitation that a mask of length 2 cannot currently be intrinsified, > leading to falling back to the Java implementations. > > Please take a look and have some reviews. Thank you very much. test/hotspot/jtreg/compiler/vectorapi/TestVectorSlice.java line 65: > 63: Asserts.assertEquals(expected, dst[i][j]); > 64: } > 65: } It should be possible to factor out this code into something like this: assertOffsets(length, (expected, i, j) -> Assert.assertEquals((byte)expected, dst[i][j]) test/hotspot/jtreg/compiler/vectorapi/TestVectorSlice.java line 68: > 66: > 67: length = 16; > 68: testB128(dst, src1, src2); Should `dst` be zeroed before the next call? or maybe easier to just reallocate. test/jdk/jdk/incubator/vector/templates/Kernel-Slice-bop-const.template line 1: > 1: $type$[] a = fa.apply(SPECIES.length()); Forgot to commit the updated unit tests? ------------- PR: https://git.openjdk.org/jdk/pull/12909