Re: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation

Paul Sandoz Thu, 19 Sep 2024 17:03:07 -0700

On Tue, 17 Sep 2024 16:13:55 GMT, Quan Anh Mai <[email protected]> wrote:


> Hi,
> 
> This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just 
> the revert of the backout.
> 
> Regarding the related issues:
> 
> - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and 
> [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed 
> before the backout.
> - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to 
> missing `ForceInline` on `AbstractVector::toBitsVectorTemplate`
> - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been 
> able to find the root causes. I'm not sure if this is a blocker, now I cannot 
> even build x86-32 tests.
> 
> Finally, I moved some implementation of public methods and methods that call 
> into intrinsics to the concrete class as that may help the compiler know the 
> correct types of the variables.
> 
> Please take a look and leave reviews. Thanks a lot.
> 
> The description of the original PR:
> 
> This patch reimplements `VectorShuffle` implementations to be a vector of the 
> bit type. Currently, `VectorShuffle` is stored as a byte array, and would be 
> expanded upon usage. This poses several drawbacks:
> 
> Inefficient conversions between a shuffle and its corresponding vector. This 
> hinders the performance when the shuffle indices are not constant and are 
> loaded or computed dynamically.
> Redundant expansions in `rearrange` operations. On all platforms, it seems 
> that a shuffle index vector is always expanded to the correct type before 
> executing the `rearrange` operations.
> Some redundant intrinsics are needed to support this handling as well as 
> special considerations in the C2 compiler.
> Range checks are performed using `VectorShuffle::toVector`, which is 
> inefficient for FP types since both FP conversions and FP comparisons are 
> more expensive than the integral ones.
> Upon these changes, a `rearrange` can emit more efficient code:
> 
>     var species = IntVector.SPECIES_128;
>     var v1 = IntVector.fromArray(species, SRC1, 0);
>     var v2 = IntVector.fromArray(species, SRC2, 0);
>     v1.rearrange(v2.toShuffle()).intoArray(DST, 0);
> 
>     Before:
>     movabs $0x751589fa8,%r10            ;   {oop([I{0x0000000751589fa8})}
>     vmovdqu 0x10(%r10),%xmm2
>     movabs $0x7515a0d08,%r10            ;   {oop([I{0x00000007515a0d08})}
>     vmovdqu 0x10(%r10),%xmm1
>     movabs $0x75158afb8,%r10            ;   {oop([I{0x000000075158afb8})}
>     vmovdqu 0x10(%r10),%xmm0
>     vpand  -0xddc12(%rip),%xmm0,%xmm0        # Stub::vector_int_to_byte_mask
>                                                             ;   {ex...

> Got it, I think #20508 and this PR are unrelated implementation-wise, though.

It would be nice if we can move independently of #20508 as that may take longer 
to integrate because of API/CSR review.

> 
> @jatin-bhateja What do you think of using this patch and intrinsifing 
> `Vector<E>::rearrange(VectorShuffle<E>, Vector<E>)` instead of introducing 
> the 2 vector `selectFrom` API?

IMO the two-vector `selectFrom` API is complementary to the existing 
single-vector `selectFrom`, and both have equivalent `rearrange` expressions. 
For either use we should ideally get to the point that a similar/identical 
optimal instruction sequence is generated.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2362388082

Re: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation

Reply via email to