On Sun, 25 Feb 2024 06:27:10 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather using hybrid algorithm which initially >> partially unrolls scalar loop to accumulates values from gather indices into >> a quadword(64bit) slice followed by vector permutation to place the slice >> into appropriate vector lanes, it prevents code bloating and generates >> compact JIT sequence. This coupled with savings from expansive array >> allocation in existing java implementation translates into significant >> performance of 1.5-10x gains with included micro. >> >>  >> >> >> 2) Patch was also compared against modified java fallback implementation by >> replacing temporary array allocation with zero initialized vector and a >> scalar loops which inserts gathered values into vector. But, vector insert >> operation in higher vector lanes is a three step process which first >> extracts the upper vector 128 bit lane, updates it with gather subword value >> and then inserts the lane back to its original position. This makes inserts >> into higher order lanes costly w.r.t to proposed solution. In addition >> generated JIT code for modified fallback implementation was very bulky. This >> may impact in-lining decisions into caller contexts. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > Review comments resolutions. Update looks better already! I have not yet reviewed `C2_MacroAssembler::vgather_subword` because of the variable names. Increasing the reviewer count, because I'd like to have someone else review that is more familiar with the Vector API (java) code. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1584: > 1582: if (elem_bt == T_SHORT) { > 1583: Label case0, case1, case2, case3; > 1584: Label* larr[] = {&case0, &case1, &case2, &case3}; Not sure if I asked this already: why define them all here, rather than locally in the loop? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1651: > 1649: * permutation to place the slice into appropriate vector lanes > 1650: * location in destination vector. Following pseudo code describes the > 1651: * algorithm in detail: Suggestion: * Gather using hybrid algorithm which initially partially unrolls scalar loop * to accumulate values from gather indices into a quad-word(64bit) slice, a * slice may hold 8 bytes or 4 short values. This is followed by a vector * permutation to place the slice into appropriate vector lane * locations in destination vector. Following pseudo code describes the * algorithm in detail: src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1663: > 1661: * > 1662: * With each iteration, doubleword permute indices (0,1) corresponding > 1663: * to gathered quadword gets right shifted by two lane position. Suggestion: * to gathered quadword gets right shifted by two lane positions. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1674: > 1672: int vector_len, int vlen_enc) { > 1673: assert(is_subword_type(elem_ty), ""); > 1674: Label GATHER8_LOOP; Why not define it righ before the first use? src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 4840: > 4838: > 4839: // Check indices are within array bounds. > 4840: // FIXME: Check index under mask controlling. This is a new FIXME in the template. I see that there are others around from before. Still, if you add a new one, I think you should at least add a bug-number that addresses this. That way, we can track the task. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16354#pullrequestreview-1900380735 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1502301147 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1502303881 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1502304352 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1502314411 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1502313404