> Hi All, > > This patch optimizes sub-word gather operation for x86 targets with AVX2 and > AVX512 features. > > Following is the summary of changes:- > > 1) Intrinsify sub-word gather with high performance backend implementation > based on hybrid algorithm which initially partially unrolls scalar loop to > accumulates values from gather indices into a quadword(64bit) slice followed > by vector permutation to place the slice into appropriate vector lanes, it > prevents code bloating and generates compact > JIT sequence. This coupled with savings from expansive array allocation in > existing java implementation translates into significant performance of > 1.3-5x gains with included micro. > > >  > > > 2) Patch was also compared against modified java fallback implementation by > replacing temporary array allocation with zero initialized vector and a > scalar loops which inserts gathered values into vector. But, vector insert > operation in higher vector lanes is a three step process which first extracts > the upper vector 128 bit lane, updates it with gather subword value and then > inserts the lane back to its original position. This makes inserts into > higher order lanes costly w.r.t to proposed solution. In addition generated > JIT code for modified fallback implementation was very bulky. This may impact > in-lining decisions into caller contexts. > > 3) Some minor adjustments in existing gather instruction pattens for > double/quad words. > > > Kindly review and share your feedback. > > > Best Regards, > Jatin
Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Restricting masked sub-word gather to AVX512 target to align with integral gather support. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16354/files - new: https://git.openjdk.org/jdk/pull/16354/files/d0d6f455..86783403 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16354&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16354&range=01-02 Stats: 93 lines in 2 files changed: 0 ins; 92 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16354.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16354/head:pull/16354 PR: https://git.openjdk.org/jdk/pull/16354