On Wed, 18 Sep 2024 17:00:30 GMT, Sandhya Viswanathan 
<sviswanat...@openjdk.org> wrote:

>> Currently the rearrange and selectFrom APIs check shuffle indices and throw 
>> IndexOutOfBoundsException if there is any exceptional source index in the 
>> shuffle. This causes the generated code to be less optimal. This PR modifies 
>> the rearrange/selectFrom Vector API methods to perform wrapIndexes instead 
>> of checkIndexes and performs optimizations to generate efficient code.
>> 
>> Summary of changes is as follows:
>>  1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes.
>>  2) Intrinsic for wrapIndexes and selectFrom to generate efficient code
>> 
>> For the following source:
>> 
>> 
>>     public void test() {
>>         var index = ByteVector.fromArray(bspecies128, shuffles[1], 0);
>>         for (int j = 0; j < bspecies128.loopBound(size); j += 
>> bspecies128.length()) {
>>             var inpvect = ByteVector.fromArray(bspecies128, byteinp, j);
>>             index.selectFrom(inpvect).intoArray(byteres, j);
>>         }
>>     }
>> 
>> 
>> The code generated for inner main now looks as follows:
>> ;; B24: #      out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of 
>> N173 strip mined) Freq: 4160.96
>>   0x00007f40d02274d0:   movslq %ebx,%r13
>>   0x00007f40d02274d3:   vmovdqu 0x10(%rsi,%r13,1),%xmm1
>>   0x00007f40d02274da:   vpshufb %xmm2,%xmm1,%xmm1
>>   0x00007f40d02274df:   vmovdqu %xmm1,0x10(%rax,%r13,1)
>>   0x00007f40d02274e6:   vmovdqu 0x20(%rsi,%r13,1),%xmm1
>>   0x00007f40d02274ed:   vpshufb %xmm2,%xmm1,%xmm1
>>   0x00007f40d02274f2:   vmovdqu %xmm1,0x20(%rax,%r13,1)
>>   0x00007f40d02274f9:   vmovdqu 0x30(%rsi,%r13,1),%xmm1
>>   0x00007f40d0227500:   vpshufb %xmm2,%xmm1,%xmm1
>>   0x00007f40d0227505:   vmovdqu %xmm1,0x30(%rax,%r13,1)
>>   0x00007f40d022750c:   vmovdqu 0x40(%rsi,%r13,1),%xmm1
>>   0x00007f40d0227513:   vpshufb %xmm2,%xmm1,%xmm1
>>   0x00007f40d0227518:   vmovdqu %xmm1,0x40(%rax,%r13,1)
>>   0x00007f40d022751f:   add    $0x40,%ebx
>>   0x00007f40d0227522:   cmp    %r8d,%ebx
>>   0x00007f40d0227525:   jl     0x00007f40d02274d0
>> 
>> Best Regards,
>> Sandhya
>
> Sandhya Viswanathan has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Change method name

Hi @sviswa7 , some comments, overall patch looks good to me.

Best Regards,
Jatin

src/hotspot/share/opto/vectorIntrinsics.cpp line 2120:

> 2118: 
> 2119:   if (vector_klass == nullptr  || elem_klass == nullptr || vlen == 
> nullptr) {
> 2120:     return false; // dead code

Why dead code in comments ?

src/hotspot/share/opto/vectorIntrinsics.cpp line 2129:

> 2127:                     NodeClassNames[argument(2)->Opcode()],
> 2128:                     NodeClassNames[argument(3)->Opcode()]);
> 2129:     return false; // not enough info for intrinsification

Please club this with above condition to be consistent with other inline 
expanders.

src/hotspot/share/opto/vectorIntrinsics.cpp line 2141:

> 2139:   }
> 2140:   BasicType elem_bt = elem_type->basic_type();
> 2141: 

Remove new line.

src/hotspot/share/opto/vectorIntrinsics.cpp line 2144:

> 2142:   int num_elem = vlen->get_con();
> 2143:   if ((num_elem < 4) || !is_power_of_2(num_elem)) {
> 2144:     log_if_needed("  ** vlen < 4 or not power of two=%d", num_elem);

Will num_elem < 4  not be handled by L2149 since we have an implementation 
limitation to support less than 32-bit shuffle / masks.

src/hotspot/share/opto/vectorIntrinsics.cpp line 2171:

> 2169:     use_predicate = false;
> 2170:     if(!is_masked_op ||
> 2171:        (!arch_supports_vector(Op_VectorRearrange, num_elem, elem_bt, 
> VecMaskNotUsed) ||

Suggestion:

       (!arch_supports_vector(Op_VectorRearrange, num_elem, elem_bt, 
VecMaskUseLoad) ||

src/hotspot/share/opto/vectorIntrinsics.cpp line 2188:

> 2186: 
> 2187:   if (v1 == nullptr || v2 == nullptr) {
> 2188:     return false; // operand unboxing failed

To be consistent with other expanders please emit proper error for unboxing 
failure like on following line.
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L426

src/hotspot/share/opto/vectorIntrinsics.cpp line 2197:

> 2195:     mask = unbox_vector(argument(6), mbox_type, elem_bt, num_elem);
> 2196:     if (mask == nullptr) {
> 2197:       log_if_needed("  ** not supported: op=selectFrom vlen=%d etype=%s 
> is_masked_op=1",

Error should an unboxing failure here.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20634#pullrequestreview-2314643808
PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766277056
PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766277739
PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766278169
PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766297640
PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766292679
PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766303620
PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766304688

Reply via email to