On Sun, 25 Feb 2024 06:27:10 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:

>> Hi All,
>> 
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and 
>> AVX512 features.
>> 
>> Following is the summary of changes:-
>> 
>> 1) Intrinsify sub-word gather using hybrid algorithm which initially 
>> partially unrolls scalar loop to accumulates values from gather indices into 
>> a quadword(64bit) slice followed by vector permutation to place the slice 
>> into appropriate vector lanes, it prevents code bloating and generates 
>> compact JIT sequence. This coupled with savings from expansive array 
>> allocation in existing java implementation translates into significant 
>> performance of 1.5-10x gains with included micro.
>> 
>> ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d)
>> 
>> 
>> 2) Patch was also compared against modified java fallback implementation by 
>> replacing temporary array allocation with zero initialized vector and a 
>> scalar loops which inserts gathered values into vector. But, vector insert 
>> operation in higher vector lanes is a three step process which first 
>> extracts the upper vector 128 bit lane, updates it with gather subword value 
>> and then inserts the lane back to its original position. This makes inserts 
>> into higher order lanes costly w.r.t to proposed solution. In addition 
>> generated JIT code for modified fallback implementation was very bulky. This 
>> may impact in-lining decisions into caller contexts.
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Review comments resolutions.

Update looks better already!
I have not yet reviewed `C2_MacroAssembler::vgather_subword` because of the 
variable names.

Increasing the reviewer count, because I'd like to have someone else review 
that is more familiar with the Vector API (java) code.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1584:

> 1582:   if (elem_bt == T_SHORT) {
> 1583:     Label case0, case1, case2, case3;
> 1584:     Label* larr[] = {&case0, &case1, &case2, &case3};

Not sure if I asked this already: why define them all here, rather than locally 
in the loop?

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1651:

> 1649:  * permutation to place the slice into appropriate vector lanes
> 1650:  * location in destination vector. Following pseudo code describes the
> 1651:  * algorithm in detail:

Suggestion:

 * Gather using hybrid algorithm which initially partially unrolls scalar loop
 * to accumulate values from gather indices into a quad-word(64bit) slice, a
 * slice may hold 8 bytes or 4 short values. This is followed by a vector
 * permutation to place the slice into appropriate vector lane
 * locations in destination vector. Following pseudo code describes the
 * algorithm in detail:

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1663:

> 1661:  *
> 1662:  * With each iteration, doubleword permute indices (0,1) corresponding
> 1663:  * to gathered quadword gets right shifted by two lane position.

Suggestion:

 * to gathered quadword gets right shifted by two lane positions.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1674:

> 1672:                                         int vector_len, int vlen_enc) {
> 1673:   assert(is_subword_type(elem_ty), "");
> 1674:   Label GATHER8_LOOP;

Why not define it righ before the first use?

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template
 line 4840:

> 4838: 
> 4839:         // Check indices are within array bounds.
> 4840:         // FIXME: Check index under mask controlling.

This is a new FIXME in the template. I see that there are others around from 
before. Still, if you add a new one, I think you should at least add a 
bug-number that addresses this. That way, we can track the task.

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16354#pullrequestreview-1900380735
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1502301147
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1502303881
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1502304352
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1502314411
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1502313404

Reply via email to