On Tue, 20 May 2025 02:22:13 GMT, Xiaohong Gong <[email protected]> wrote:
>> Ping again~ could any one please take a look at this PR? Thanks a lot!
>
>> Hi @XiaohongGong , Very nice work!, Looks good to me, will do some testing
>> and get back.
>>
>> Do you have any idea about following regression?
>>
>> ```
>> GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms
>> 64 55844.814 48311.847 0.86
>> GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms
>> 256 15139.459 13009.848 0.85
>> GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms
>> 1024 3861.834 3284.944 0.85
>> GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms
>> 4096 938.665 817.673 0.87
>> ```
>>
>> Best Regards
>
> Yes, I also observed such regression. After analyzing, I found it was caused
> by the java side changes, which influences the range check elimination inside
> `IntVector.fromArray()` and `ByteVector.intoArray()` in the benchmark. The
> root cause is the counted loop in following benchmark case is not recognized
> by compiler as expected:
>
> public void microByteGather256() {
> for (int i = 0; i < SIZE; i += B256.length()) {
> ByteVector.fromArray(B256, barr, 0, index, i)
> .intoArray(bres, i);
> }
> }
> ```
> The loop iv phi node is not recognized successfully when C2 recognize the
> counted loop pattern, because it was casted twice with `CastII` in this case.
> The ideal graph looks like:
>
> Loop
> \
> \ / -----------------------------
> \ / |
> Phi |
> | |
> CastII |
> | |
> CastII |
> | |
> \ ConI |
> \ | |
> AddVI |
> |-------------------------|
>
>
> Relative code is
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopnode.cpp#L1667.
>
>
> Befor...
> @XiaohongGong Thanks for splitting this one out, and for investigating the
> regressions here.
>
> Putting the permalink here, fixed to the current change (the link you pasted
> will always refer to the newest, which may later on point to the wrong line
> when lines above are inserted / deleted):
>
> https://github.com/openjdk/jdk/blob/7077535c0b0a6ea0a2a167f9135b1504a3d71fb3/src/hotspot/share/opto/loopnode.cpp#L1659-L1661
>
> I wonder if we should just use `Node::uncast` there? But I'm quite unsure
> about that.
Sounds good to me. I will have a deep investigation for it. Thanks!
> > Yes, I also observed such regression.
> > It would be nice if you proactively mentioned regressions, so it does not
> > have to be pointed out by reviewers.
>
> For me, it could be ok to fix it in a follow-up patch. I think we are too
> close to RDP1 for JDK25 now anyway, and so we could push this patch here into
> JDK26, and then we have enough time in JDK26 to investigate the regression.
> Even better would be if we could do the other patch first, so we never even
> encounter a regression.
Sounds good to me. Thanks!
-------------
PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-2893026228