On Mon, 19 Aug 2024 21:47:23 GMT, Sandhya Viswanathan
<[email protected]> wrote:
> Currently the rearrange and selectFrom APIs check shuffle indices and throw
> IndexOutOfBoundsException if there is any exceptional source index in the
> shuffle. This causes the generated code to be less optimal. This PR modifies
> the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of
> checkIndexes and performs optimizations to generate efficient code.
>
> Summary of changes is as follows:
> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes.
> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code
>
> For the following source:
>
>
> public void test() {
> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0);
> for (int j = 0; j < bspecies128.loopBound(size); j +=
> bspecies128.length()) {
> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j);
> index.selectFrom(inpvect).intoArray(byteres, j);
> }
> }
>
>
> The code generated for inner main now looks as follows:
> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of
> N173 strip mined) Freq: 4160.96
> 0x00007f40d02274d0: movslq %ebx,%r13
> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1
> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1
> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1)
> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1
> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1
> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1)
> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1
> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1
> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1)
> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1
> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1
> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1)
> 0x00007f40d022751f: add $0x40,%ebx
> 0x00007f40d0227522: cmp %r8d,%ebx
> 0x00007f40d0227525: jl 0x00007f40d02274d0
>
> Best Regards,
> Sandhya
src/hotspot/share/opto/vectorIntrinsics.cpp line 2206:
> 2204: const Type * byte_bt = Type::get_const_basic_type(T_BYTE);
> 2205: const TypeVect * byte_vt = TypeVect::make(byte_bt, num_elem);
> 2206: Node* byte_shuffle = gvn().transform(VectorCastNode::make(cast_vopc,
> v1, T_BYTE, num_elem));
We can be optimal here and prevent down casting and subsequent load shuffles in
applicable scenarios, e.g. indexes held in integral vectors.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1758203424