Re: RFR: 8293198: [vectorapi] Improve the implementation of VectorMask.indexInRange()

Jatin Bhateja Wed, 01 Feb 2023 22:41:48 -0800

On Thu, 2 Feb 2023 03:42:34 GMT, Xiaohong Gong <xg...@openjdk.org> wrote:


>> Your patch re-organized the code to take care of following cases :-
>> 1.   offset < 0                         : Still falling over to old code.
>> 2.   offset >= limit                  : Special case optimized 
>> 3.   (limit - offset) >= length  :  Most general case.
>> 4.   Partial in range check       :  New intrinsic, existing java side 
>> implementation in checkIndex0 is also composed of intrinsified calls,  
>> infrequently taken paths will anyways lead to uncommon traps.
>> 
>> Have you tried introducing just case 3 (first) and then case 2 in existing 
>> indexInRange implementation.
>
>> Thanks for the review @jatin-bhateja !
>> 
>> > Have you tried introducing just case 3 (first) and then case 2 in existing 
>> > indexInRange implementation.
>> 
>> Yes, I tried with this way as well, together with the performance testing 
>> compared with current version. The result is:
>> 
>> 1. For cases that the array size is aligned with the vector size, the 
>> performance is almost the same with each other.
>> 2. For cases that the array size is not aligned (which needs the masked 
>> operation), the performance with current version can improve about 5%~6% 
>> compared with the way as you suggested. But the improvement is limited to 
>> the cases with smaller vector size, whose tail loop size is bigger.
>> 
>> So, adding the new intrinsic has no side effect for me, except for the more 
>> complex java side code and one additional intrinsic.
> 
> Another benefit is the code change will be simpler, that almost all the files 
> except the `AbstractMask.java` do not need to be changed. I will make the 
> modification and rerun the benchmarks to decide whether it's necessary to add 
> the new intrinsics. Thanks!

While you talked about Java side changes, I found another opportunity for 
optimization in checkIndex0 implementation, in the following code snippet from 
checkIndex0 method, indexLimit is guaranteed to be a +ve value.


         int indexLimit = Math.max(0, Math.min(length - offset, vlength));
         VectorMask<E> badMask =
            iota.compare(GE, iota.broadcast(indexLimit)); 

For float/double iota vectors, subsequent broadcast operation accepting long 
argument checks for precision loss before broadcasting floating point value.  


       long longToElementBits(long value) {
            // Do the conversion, and then test it for failure.
            float e = (float) value;
            if ((long) e != value) {
                throw badElementBits(value, e);
            }
            return toBits(e);
        }`

 
This can be saved by doing a prior casting of indexLimit to floating point 
value before calling broadcast, effectively we may need to move checkIndex0 to 
template generated abstract vector classes.

-------------

PR: https://git.openjdk.org/jdk/pull/12064

Re: RFR: 8293198: [vectorapi] Improve the implementation of VectorMask.indexInRange()

Reply via email to