On Tue, 21 Mar 2023 16:16:31 GMT, Quan Anh Mai <[email protected]> wrote:
>> Hi,
>>
>> This patch reimplements `VectorShuffle` implementations to be a vector of
>> the bit type. Currently, VectorShuffle is stored as a byte array, and would
>> be expanded upon usage. This poses several drawbacks:
>>
>> 1. Inefficient conversions between a shuffle and its corresponding vector.
>> This hinders the performance when the shuffle indices are not constant and
>> are loaded or computed dynamically.
>> 2. Redundant expansions in `rearrange` operations. On all platforms, it
>> seems that a shuffle index vector is always expanded to the correct type
>> before executing the `rearrange` operations.
>> 3. Some redundant intrinsics are needed to support this handling as well as
>> special considerations in the C2 compiler.
>> 4. Range checks are performed using `VectorShuffle::toVector`, which is
>> inefficient for FP types since both FP conversions and FP comparisons are
>> more expensive than the integral ones.
>>
>> Upon these changes, a `rearrange` can emit more efficient code:
>>
>> var species = IntVector.SPECIES_128;
>> var v1 = IntVector.fromArray(species, SRC1, 0);
>> var v2 = IntVector.fromArray(species, SRC2, 0);
>> v1.rearrange(v2.toShuffle()).intoArray(DST, 0);
>>
>> Before:
>> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})}
>> vmovdqu 0x10(%r10),%xmm2
>> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})}
>> vmovdqu 0x10(%r10),%xmm1
>> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})}
>> vmovdqu 0x10(%r10),%xmm0
>> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask
>> ;
>> {external_word}
>> vpackusdw %xmm0,%xmm0,%xmm0
>> vpackuswb %xmm0,%xmm0,%xmm0
>> vpmovsxbd %xmm0,%xmm3
>> vpcmpgtd %xmm3,%xmm1,%xmm3
>> vtestps %xmm3,%xmm3
>> jne 0x00007fc2acb4e0d8
>> vpmovzxbd %xmm0,%xmm0
>> vpermd %ymm2,%ymm0,%ymm0
>> movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})}
>> vmovdqu %xmm0,0x10(%r10)
>>
>> After:
>> movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})}
>> vmovdqu 0x10(%r10),%xmm1
>> movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})}
>> vmovdqu 0x10(%r10),%xmm2
>> vpxor %xmm0,%xmm0,%xmm0
>> vpcmpgtd %xmm2,%xmm0,%xmm3
>> vtestps %xmm3,%xmm3
>> jne 0x00007fa818b27cb1
>> vpermd %ymm1,%ymm2,%ymm0
>> movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})}
>> vmovdqu %xmm0,0x10(%r10)
>>
>> Please take a look and leave reviews. Thanks a lot.
>
> Quan Anh Mai has updated the pull request incrementally with two additional
> commits since the last revision:
>
> - missing casts
> - clean up
src/hotspot/cpu/aarch64/aarch64_vector.ad line 6082:
> 6080: // to implement rearrange.
> 6081:
> 6082: // Maybe move the shuffle preparation to VectorLoadShuffle
Agree that moving the shuffle computation code to `VectorLoadShuffle`. Thanks!
src/hotspot/share/opto/vectorIntrinsics.cpp line 2059:
> 2057: if (need_load_shuffle) {
> 2058: shuffle = gvn().transform(new VectorLoadShuffleNode(shuffle, vt));
> 2059: }
How about generating `VectorLoadShuffleNode` for all platforms that support
Vector API, and remove the helper method `vector_needs_load_shuffle()` ? For
those platforms that do not need this shuffle preparation, we can emit nothing
in codegen.
src/hotspot/share/opto/vectorIntrinsics.cpp line 2426:
> 2424: if (is_vector_shuffle(vbox_klass_from)) {
> 2425: return false; // vector shuffles aren't supported
> 2426: }
Is it better to change this as an "assertion" or print the log details?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1144366812
PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1144360349
PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1144363416