Hi, This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout.
Regarding the related issues: - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. Please take a look and leave reviews. Thanks a lot. The description of the original PR: This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. Upon these changes, a `rearrange` can emit more efficient code: var species = IntVector.SPECIES_128; var v1 = IntVector.fromArray(species, SRC1, 0); var v2 = IntVector.fromArray(species, SRC2, 0); v1.rearrange(v2.toShuffle()).intoArray(DST, 0); Before: movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} vmovdqu 0x10(%r10),%xmm2 movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} vmovdqu 0x10(%r10),%xmm1 movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} vmovdqu 0x10(%r10),%xmm0 vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask ; {external_word} vpackusdw %xmm0,%xmm0,%xmm0 vpackuswb %xmm0,%xmm0,%xmm0 vpmovsxbd %xmm0,%xmm3 vpcmpgtd %xmm3,%xmm1,%xmm3 vtestps %xmm3,%xmm3 jne 0x00007fc2acb4e0d8 vpmovzxbd %xmm0,%xmm0 vpermd %ymm2,%ymm0,%ymm0 movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} vmovdqu %xmm0,0x10(%r10) After: movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} vmovdqu 0x10(%r10),%xmm1 movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} vmovdqu 0x10(%r10),%xmm2 vpxor %xmm0,%xmm0,%xmm0 vpcmpgtd %xmm2,%xmm0,%xmm3 vtestps %xmm3,%xmm3 jne 0x00007fa818b27cb1 vpermd %ymm1,%ymm2,%ymm0 movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} vmovdqu %xmm0,0x10(%r10) ------------- Commit messages: - copyright year - remove LoadShuffle from riscv, whitespace - tighten concrete types - [vectorapi] Refactor VectorShuffle implementation Changes: https://git.openjdk.org/jdk/pull/21042/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21042&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310691 Stats: 4984 lines in 64 files changed: 2984 ins; 981 del; 1019 mod Patch: https://git.openjdk.org/jdk/pull/21042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21042/head:pull/21042 PR: https://git.openjdk.org/jdk/pull/21042