On Mon, 19 Aug 2024 21:47:23 GMT, Sandhya Viswanathan <sviswanat...@openjdk.org> wrote:
> Currently the rearrange and selectFrom APIs check shuffle indices and throw > IndexOutOfBoundsException if there is any exceptional source index in the > shuffle. This causes the generated code to be less optimal. This PR modifies > the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of > checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += > bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of > N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 2439: > 2437: (v1, s_, m_) -> v1.uOp((i, a) -> { > 2438: int ei = s_.laneSource(i); > 2439: return ei < 0 || !m_.laneIsSet(i) ? 0 : > v1.lane(ei); The `ei < 0` test is redundant. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 2637: > 2635: * > 2636: * For each lane {@code N} of the shuffle, and for each lane > 2637: * source index {@code I=s.wrapIndex(s.laneSource(N))} in the > shuffle, The pseudo code below starting at line 2644 needs adjusting to: Vector<E> r = this.rearrange(s); return broadcast(0).blend(r, m); src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 2755: > 2753: * > 2754: * The result is the same as the expression > 2755: * {@code v.rearrange(this.toShuffle().wrapIndexes())}. Since we also adjusted `rearrange` the existing expression is fine, recommend no change here and to the mask accepting version. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759431093 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759428672 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759418829