On Mon, 19 Aug 2024 21:47:23 GMT, Sandhya Viswanathan 
<sviswanat...@openjdk.org> wrote:

> Currently the rearrange and selectFrom APIs check shuffle indices and throw 
> IndexOutOfBoundsException if there is any exceptional source index in the 
> shuffle. This causes the generated code to be less optimal. This PR modifies 
> the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of 
> checkIndexes and performs optimizations to generate efficient code.
> 
> Summary of changes is as follows:
>  1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes.
>  2) Intrinsic for wrapIndexes and selectFrom to generate efficient code
> 
> For the following source:
> 
> 
>     public void test() {
>         var index = ByteVector.fromArray(bspecies128, shuffles[1], 0);
>         for (int j = 0; j < bspecies128.loopBound(size); j += 
> bspecies128.length()) {
>             var inpvect = ByteVector.fromArray(bspecies128, byteinp, j);
>             index.selectFrom(inpvect).intoArray(byteres, j);
>         }
>     }
> 
> 
> The code generated for inner main now looks as follows:
> ;; B24: #      out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of 
> N173 strip mined) Freq: 4160.96
>   0x00007f40d02274d0:   movslq %ebx,%r13
>   0x00007f40d02274d3:   vmovdqu 0x10(%rsi,%r13,1),%xmm1
>   0x00007f40d02274da:   vpshufb %xmm2,%xmm1,%xmm1
>   0x00007f40d02274df:   vmovdqu %xmm1,0x10(%rax,%r13,1)
>   0x00007f40d02274e6:   vmovdqu 0x20(%rsi,%r13,1),%xmm1
>   0x00007f40d02274ed:   vpshufb %xmm2,%xmm1,%xmm1
>   0x00007f40d02274f2:   vmovdqu %xmm1,0x20(%rax,%r13,1)
>   0x00007f40d02274f9:   vmovdqu 0x30(%rsi,%r13,1),%xmm1
>   0x00007f40d0227500:   vpshufb %xmm2,%xmm1,%xmm1
>   0x00007f40d0227505:   vmovdqu %xmm1,0x30(%rax,%r13,1)
>   0x00007f40d022750c:   vmovdqu 0x40(%rsi,%r13,1),%xmm1
>   0x00007f40d0227513:   vpshufb %xmm2,%xmm1,%xmm1
>   0x00007f40d0227518:   vmovdqu %xmm1,0x40(%rax,%r13,1)
>   0x00007f40d022751f:   add    $0x40,%ebx
>   0x00007f40d0227522:   cmp    %r8d,%ebx
>   0x00007f40d0227525:   jl     0x00007f40d02274d0
> 
> Best Regards,
> Sandhya

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java 
line 2439:

> 2437:                    (v1, s_, m_) -> v1.uOp((i, a) -> {
> 2438:                         int ei = s_.laneSource(i);
> 2439:                         return ei < 0  || !m_.laneIsSet(i) ? 0 : 
> v1.lane(ei);

The `ei < 0` test is redundant.

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 
2637:

> 2635:      *
> 2636:      * For each lane {@code N} of the shuffle, and for each lane
> 2637:      * source index {@code I=s.wrapIndex(s.laneSource(N))} in the 
> shuffle,

The pseudo code below starting at line 2644 needs adjusting to:

Vector<E> r = this.rearrange(s);
return broadcast(0).blend(r, m);

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 
2755:

> 2753:      *
> 2754:      * The result is the same as the expression
> 2755:      * {@code v.rearrange(this.toShuffle().wrapIndexes())}.

Since we also adjusted `rearrange` the existing expression is fine, recommend 
no change here and to the mask accepting version.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759431093
PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759428672
PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759418829

Reply via email to