Re: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v5]

Quan Anh Mai Thu, 30 Mar 2023 07:31:43 -0700

> Hi,
> 
> This patch reimplements `VectorShuffle` implementations to be a vector of the 
> bit type. Currently, VectorShuffle is stored as a byte array, and would be 
> expanded upon usage. This poses several drawbacks:
> 
> 1. Inefficient conversions between a shuffle and its corresponding vector. 
> This hinders the performance when the shuffle indices are not constant and 
> are loaded or computed dynamically.
> 2. Redundant expansions in `rearrange` operations. On all platforms, it seems 
> that a shuffle index vector is always expanded to the correct type before 
> executing the `rearrange` operations.
> 3. Some redundant intrinsics are needed to support this handling as well as 
> special considerations in the C2 compiler.
> 4. Range checks are performed using `VectorShuffle::toVector`, which is 
> inefficient for FP types since both FP conversions and FP comparisons are 
> more expensive than the integral ones.
> 
> Upon these changes, a `rearrange` can emit more efficient code:
> 
>     var species = IntVector.SPECIES_128;
>     var v1 = IntVector.fromArray(species, SRC1, 0);
>     var v2 = IntVector.fromArray(species, SRC2, 0);
>     v1.rearrange(v2.toShuffle()).intoArray(DST, 0);
> 
>     Before:
>     movabs $0x751589fa8,%r10            ;   {oop([I{0x0000000751589fa8})}
>     vmovdqu 0x10(%r10),%xmm2
>     movabs $0x7515a0d08,%r10            ;   {oop([I{0x00000007515a0d08})}
>     vmovdqu 0x10(%r10),%xmm1
>     movabs $0x75158afb8,%r10            ;   {oop([I{0x000000075158afb8})}
>     vmovdqu 0x10(%r10),%xmm0
>     vpand  -0xddc12(%rip),%xmm0,%xmm0        # Stub::vector_int_to_byte_mask
>                                                             ;   
> {external_word}
>     vpackusdw %xmm0,%xmm0,%xmm0
>     vpackuswb %xmm0,%xmm0,%xmm0
>     vpmovsxbd %xmm0,%xmm3
>     vpcmpgtd %xmm3,%xmm1,%xmm3
>     vtestps %xmm3,%xmm3
>     jne    0x00007fc2acb4e0d8
>     vpmovzxbd %xmm0,%xmm0
>     vpermd %ymm2,%ymm0,%ymm0
>     movabs $0x751588f98,%r10            ;   {oop([I{0x0000000751588f98})}
>     vmovdqu %xmm0,0x10(%r10)
> 
>     After:
>     movabs $0x751589c78,%r10            ;   {oop([I{0x0000000751589c78})}
>     vmovdqu 0x10(%r10),%xmm1
>     movabs $0x75158ac88,%r10            ;   {oop([I{0x000000075158ac88})}
>     vmovdqu 0x10(%r10),%xmm2
>     vpxor  %xmm0,%xmm0,%xmm0
>     vpcmpgtd %xmm2,%xmm0,%xmm3
>     vtestps %xmm3,%xmm3
>     jne    0x00007fa818b27cb1
>     vpermd %ymm1,%ymm2,%ymm0
>     movabs $0x751588c68,%r10            ;   {oop([I{0x0000000751588c68})}
>     vmovdqu %xmm0,0x10(%r10)
>     
> Please take a look and leave reviews. Thanks a lot.


Quan Anh Mai has updated the pull request with a new target base due to a merge 
or a rebase. The pull request now contains 14 commits:

 - move implementations up
 - Merge branch 'master' into shufflerefactor
 - Merge branch 'master' into shufflerefactor
 - reviews
 - missing casts
 - clean up
 - fix Matcher::vector_needs_load_shuffle
 - fix internal types, clean up
 - optimise laneIsValid
 - Merge branch 'master' into shufflerefactor
 - ... and 4 more: https://git.openjdk.org/jdk/compare/d063b896...a4835c00

-------------

Changes: https://git.openjdk.org/jdk/pull/13093/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13093&range=04
  Stats: 3683 lines in 64 files changed: 1610 ins; 1169 del; 904 mod
  Patch: https://git.openjdk.org/jdk/pull/13093.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13093/head:pull/13093

PR: https://git.openjdk.org/jdk/pull/13093

Re: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v5]

Reply via email to