On Wed, 18 Sep 2024 17:00:30 GMT, Sandhya Viswanathan <sviswanat...@openjdk.org> wrote:
>> Currently the rearrange and selectFrom APIs check shuffle indices and throw >> IndexOutOfBoundsException if there is any exceptional source index in the >> shuffle. This causes the generated code to be less optimal. This PR modifies >> the rearrange/selectFrom Vector API methods to perform wrapIndexes instead >> of checkIndexes and performs optimizations to generate efficient code. >> >> Summary of changes is as follows: >> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. >> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code >> >> For the following source: >> >> >> public void test() { >> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); >> for (int j = 0; j < bspecies128.loopBound(size); j += >> bspecies128.length()) { >> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); >> index.selectFrom(inpvect).intoArray(byteres, j); >> } >> } >> >> >> The code generated for inner main now looks as follows: >> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of >> N173 strip mined) Freq: 4160.96 >> 0x00007f40d02274d0: movslq %ebx,%r13 >> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) >> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) >> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) >> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) >> 0x00007f40d022751f: add $0x40,%ebx >> 0x00007f40d0227522: cmp %r8d,%ebx >> 0x00007f40d0227525: jl 0x00007f40d02274d0 >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one > additional commit since the last revision: > > Change method name src/hotspot/share/opto/vectorIntrinsics.cpp line 772: > 770: > 771: if (elem_klass == nullptr || shuffle_klass == nullptr || > shuffle->is_top() || vlen == nullptr) { > 772: return false; // dead code Why dead code in comment ? this is a failed intrinsification condition. src/hotspot/share/opto/vectorIntrinsics.cpp line 776: > 774: if (!vlen->is_con() || shuffle_klass->const_oop() == nullptr) { > 775: return false; // not enough info for intrinsification > 776: } Why don't you club it with above conditions to be consistent with other inline expanders ? src/hotspot/share/opto/vectorIntrinsics.cpp line 790: > 788: // Shuffles use byte array based backing storage > 789: BasicType shuffle_bt = T_BYTE; > 790: No need a of new line b/w 789 and 791 src/hotspot/share/opto/vectorIntrinsics.cpp line 793: > 791: if (!arch_supports_vector(Op_AndV, num_elem, shuffle_bt, > VecMaskNotUsed) || > 792: !arch_supports_vector(Op_Replicate, num_elem, shuffle_bt, > VecMaskNotUsed)) { > 793: return false; You should emit proper intrinsification failure message here. src/hotspot/share/opto/vectorIntrinsics.cpp line 805: > 803: const TypeVect* vt = TypeVect::make(shuffle_bt, num_elem); > 804: const Type* shuffle_type_bt = Type::get_const_basic_type(shuffle_bt); > 805: No need of a blank line here. src/hotspot/share/opto/vectorIntrinsics.cpp line 808: > 806: Node* mod_mask = gvn().makecon(TypeInt::make(num_elem-1)); > 807: Node* bcast_mod_mask = > gvn().transform(VectorNode::scalar2vector(mod_mask, num_elem, > shuffle_type_bt)); > 808: Remove redundant new line. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766272449 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766273205 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766273880 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766274718 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766275107 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766275345