`Vector::slice` is a method at the top-level class of the Vector API that 
concatenates the 2 inputs into an intermediate composite and extracts a window 
equal to the size of the inputs into the result. It is used in vector 
conversion methods where the part number is not 0 to slice the parts to the 
correct positions. Slicing is also used in text processing such as utf8 and 
utf16 validation. x86 starting from SSSE3 has `palignr` which does vector 
slicing very efficiently. As a result, I think it is beneficial to add a C2 
node for this operation as well as intrinsify `Vector::slice` method.

A slice is currently implemented as 
`v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires 
preparation of the index vector and the blending mask. Even with the 
preparations being hoisted out of the loops, microbenchmarks show improvement 
using the slice instrinsics. Some have tremendous increases in throughput due 
to the limitation that a mask of length 2 cannot currently be intrinsified, 
leading to falling back to the Java implementations.

Please take a look and have some reviews. Thank you very much.

-------------

Commit messages:
 - sse2, increase warmup
 - aesthetic
 - optimise 64B
 - add jmh
 - vector slice intrinsics

Changes: https://git.openjdk.org/jdk/pull/12909/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12909&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8303762
  Stats: 1699 lines in 58 files changed: 1376 ins; 257 del; 66 mod
  Patch: https://git.openjdk.org/jdk/pull/12909.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/12909/head:pull/12909

PR: https://git.openjdk.org/jdk/pull/12909

Reply via email to