On Wed, 18 Sep 2024 17:00:30 GMT, Sandhya Viswanathan <sviswanat...@openjdk.org> wrote:
>> Currently the rearrange and selectFrom APIs check shuffle indices and throw >> IndexOutOfBoundsException if there is any exceptional source index in the >> shuffle. This causes the generated code to be less optimal. This PR modifies >> the rearrange/selectFrom Vector API methods to perform wrapIndexes instead >> of checkIndexes and performs optimizations to generate efficient code. >> >> Summary of changes is as follows: >> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. >> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code >> >> For the following source: >> >> >> public void test() { >> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); >> for (int j = 0; j < bspecies128.loopBound(size); j += >> bspecies128.length()) { >> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); >> index.selectFrom(inpvect).intoArray(byteres, j); >> } >> } >> >> >> The code generated for inner main now looks as follows: >> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of >> N173 strip mined) Freq: 4160.96 >> 0x00007f40d02274d0: movslq %ebx,%r13 >> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) >> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) >> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) >> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) >> 0x00007f40d022751f: add $0x40,%ebx >> 0x00007f40d0227522: cmp %r8d,%ebx >> 0x00007f40d0227525: jl 0x00007f40d02274d0 >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one > additional commit since the last revision: > > Change method name Hi @sviswa7 , some comments, overall patch looks good to me. Best Regards, Jatin src/hotspot/share/opto/vectorIntrinsics.cpp line 2120: > 2118: > 2119: if (vector_klass == nullptr || elem_klass == nullptr || vlen == > nullptr) { > 2120: return false; // dead code Why dead code in comments ? src/hotspot/share/opto/vectorIntrinsics.cpp line 2129: > 2127: NodeClassNames[argument(2)->Opcode()], > 2128: NodeClassNames[argument(3)->Opcode()]); > 2129: return false; // not enough info for intrinsification Please club this with above condition to be consistent with other inline expanders. src/hotspot/share/opto/vectorIntrinsics.cpp line 2141: > 2139: } > 2140: BasicType elem_bt = elem_type->basic_type(); > 2141: Remove new line. src/hotspot/share/opto/vectorIntrinsics.cpp line 2144: > 2142: int num_elem = vlen->get_con(); > 2143: if ((num_elem < 4) || !is_power_of_2(num_elem)) { > 2144: log_if_needed(" ** vlen < 4 or not power of two=%d", num_elem); Will num_elem < 4 not be handled by L2149 since we have an implementation limitation to support less than 32-bit shuffle / masks. src/hotspot/share/opto/vectorIntrinsics.cpp line 2171: > 2169: use_predicate = false; > 2170: if(!is_masked_op || > 2171: (!arch_supports_vector(Op_VectorRearrange, num_elem, elem_bt, > VecMaskNotUsed) || Suggestion: (!arch_supports_vector(Op_VectorRearrange, num_elem, elem_bt, VecMaskUseLoad) || src/hotspot/share/opto/vectorIntrinsics.cpp line 2188: > 2186: > 2187: if (v1 == nullptr || v2 == nullptr) { > 2188: return false; // operand unboxing failed To be consistent with other expanders please emit proper error for unboxing failure like on following line. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L426 src/hotspot/share/opto/vectorIntrinsics.cpp line 2197: > 2195: mask = unbox_vector(argument(6), mbox_type, elem_bt, num_elem); > 2196: if (mask == nullptr) { > 2197: log_if_needed(" ** not supported: op=selectFrom vlen=%d etype=%s > is_masked_op=1", Error should an unboxing failure here. ------------- PR Review: https://git.openjdk.org/jdk/pull/20634#pullrequestreview-2314643808 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766277056 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766277739 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766278169 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766297640 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766292679 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766303620 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766304688