> Hi All, > > This patch optimizes sub-word gather operation for x86 targets with AVX2 and > AVX512 features. > > Following is the summary of changes:- > > 1) Intrinsify sub-word gather using hybrid algorithm which initially > partially unrolls scalar loop to accumulates values from gather indices into > a quadword(64bit) slice followed by vector permutation to place the slice > into appropriate vector lanes, it prevents code bloating and generates > compact JIT sequence. This coupled with savings from expansive array > allocation in existing java implementation translates into significant > performance of 1.5-10x gains with included micro on Intel Atom family CPUs > and with JVM option UseAVX=2. > >  > > > 2) For AVX512 targets algorithm uses integral gather instructions to load > values from normalized indices which are multiple of integer size, followed > by shuffling and packing exact sub-word values from integral lanes. > > 3) Patch was also compared against modified java fallback implementation by > replacing temporary array allocation with zero initialized vector and a > scalar loops which inserts gathered values into vector. But, vector insert > operation in higher vector lanes is a three step process which first extracts > the upper vector 128 bit lane, updates it with gather subword value and then > inserts the lane back to its original position. This makes inserts into > higher order lanes costly w.r.t to proposed solution. In addition generated > JIT code for modified fallback implementation was very bulky. This may impact > in-lining decisions into caller contexts. > > Kindly review and share your feedback. > > Best Regards, > Jatin
Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Generalizing masked sub-gather support. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8318650 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8318650 - Fix incorrect comment - Review comments resolutions. - Review comments resolutions. - Review comments resolutions. - Restricting masked sub-word gather to AVX512 target to align with integral gather support. - Review comments resolution. - 8318650: Optimized subword gather for x86 targets. ------------- Changes: https://git.openjdk.org/jdk/pull/16354/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16354&range=11 Stats: 1216 lines in 32 files changed: 1168 ins; 20 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/16354.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16354/head:pull/16354 PR: https://git.openjdk.org/jdk/pull/16354