Hi All,

This patch optimizes sub-word gather operation for x86 targets with AVX2 and 
AVX512 features.

Following is the summary of changes:-

1) Intrinsify sub-word gather with high performance backend implementation 
based on hybrid algorithm which initially partially unrolls scalar loop to 
accumulates values from gather indices into a quadword(64bit) slice followed by 
vector permutation to place the slice into appropriate vector lanes, it 
prevents code bloating and generates compact
JIT sequence. This coupled with savings from expansive array allocation in 
existing java implementation translates into significant performance of 1.3-5x 
gains with included micro.


![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d)


2) Patch was also compared against modified java fallback implementation by 
replacing temporary array allocation with zero initialized vector and a scalar 
loops which inserts gathered values into vector. But, vector insert operation 
in higher vector lanes is a three step process which first extracts the upper 
vector 128 bit lane, updates it with gather subword value and then inserts the 
lane back to its original position. This makes inserts into higher order lanes 
costly w.r.t to proposed solution. In addition generated JIT code for modified 
fallback implementation was very bulky. This may impact in-lining decisions 
into caller contexts.

3) Some minor adjustments in existing gather instruction pattens for 
double/quad words.


Kindly review and share your feedback.


Best Regards,
Jatin

-------------

Commit messages:
 - 8318650: Optimized subword gather for x86 targets.

Changes: https://git.openjdk.org/jdk/pull/16354/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16354&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8318650
  Stats: 1186 lines in 31 files changed: 1123 ins; 34 del; 29 mod
  Patch: https://git.openjdk.org/jdk/pull/16354.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/16354/head:pull/16354

PR: https://git.openjdk.org/jdk/pull/16354

Reply via email to