Re: RFR: 8318650: Optimized subword gather for x86 targets. [v15]

Emanuel Peter Mon, 26 Feb 2024 05:55:34 -0800

On Mon, 26 Feb 2024 13:14:24 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:


>> Hi All,
>> 
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and 
>> AVX512 features.
>> 
>> Following is the summary of changes:-
>> 
>> 1) Intrinsify sub-word gather using hybrid algorithm which initially 
>> partially unrolls scalar loop to accumulates values from gather indices into 
>> a quadword(64bit) slice followed by vector permutation to place the slice 
>> into appropriate vector lanes, it prevents code bloating and generates 
>> compact JIT sequence. This coupled with savings from expansive array 
>> allocation in existing java implementation translates into significant 
>> performance of 1.5-10x gains with included micro.
>> 
>> ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d)
>> 
>> 
>> 2) Patch was also compared against modified java fallback implementation by 
>> replacing temporary array allocation with zero initialized vector and a 
>> scalar loops which inserts gathered values into vector. But, vector insert 
>> operation in higher vector lanes is a three step process which first 
>> extracts the upper vector 128 bit lane, updates it with gather subword value 
>> and then inserts the lane back to its original position. This makes inserts 
>> into higher order lanes costly w.r.t to proposed solution. In addition 
>> generated JIT code for modified fallback implementation was very bulky. This 
>> may impact in-lining decisions into caller contexts.
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Review comments resolutions

Reposting link to a conversation that is marked "resolved":
https://github.com/openjdk/jdk/pull/16354#discussion_r1502617942

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1672:

> 1670:                                         XMMRegister xtmp3, Register 
> rtmp,
> 1671:                                         Register mask_idx, Register 
> length,
> 1672:                                         int vector_len, int vlen_enc) {

> These are temporary variable and appropriately named.

I disagree. Names should be descriptive, and not numbered.

At least some of your registers here have only a single use. Those could be 
named:
`xtmp_xxxx` instead of `xtmp1..3`.

If for some reason you don't want to do that, then say why, and at least add a 
comment that helps the reader have a mental map from (meaningless) register 
names to their meaning. I really don't want to have to reverse-engineer things 
in a review if it can be avoided.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16354#issuecomment-1964188004
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1502640887

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v15]

Reply via email to