On Wed, 26 Feb 2025 11:32:57 GMT, Galder Zamarreño <gal...@openjdk.org> wrote:

> > That said: if we know that it is only in the high-probability cases, then 
> > we can address those separately. I would not consider it a blocking issue, 
> > as long as we file the follow-up RFE for int/max scalar case with high 
> > branch probability.
> > What would be really helpful: a list of all regressions / issues, and how 
> > we intend to deal with them. If we later find a regression that someone 
> > cares about, then we can come back to that list, and justify the decision 
> > we made here.
> 
> I'll make up a list of regressions and post it here. I won't create RFEs for 
> now. I'd rather wait until we have the list in front of us and we can decide 
> which RFEs to create.

Before noting the regressions, it's worth noting that PR also improves 
performance certain scenarios. I will summarise those tomorrow.

Here's a summary of the regressions

### Regression 1
Given a loop with a long min/max reduction pattern with one side of branch 
taken near 100% of time, when Supeword finds the pattern not profitable, then 
HotSpot will use scalar instructions (cmov) and performance will regress.

Possible solutions:
a) make Superword recognise these scenarios as profitable.

### Regression 2
Given a loop with a long min/max reduction pattern with one side of branch near 
100% of time, when the platform does not support vector instructions to achieve 
this (e.g. AVX-512 quad word vpmax/vpmin), then HotSpot will use scalar 
instructions (cmov) and performance will regress.

Possible solutions
a) find a way to use other vector instructions (vpcmp+vpblend+vmov?)
b) fallback on more suitable scalar instructions, e.g. cmp+mov, when the branch 
is very one-sided

### Regression 3
Given a loop with a long min/max non-reduction pattern (e.g. `longLoopMax`) 
with one side of branch taken near 100% of time, when the platform does not 
vectorize it (either lack of CPU instruction support, or Superword finding not 
profitable), then HotSpot will use scalar instructions (cmov) and performance 
will regress.

Possible solutions:
a) find a way to use other vector instructions (e.g. `longLoopMax` vectorizes 
with AVX2 and might also do with earlier instruction sets)
b) fallback on more suitable scalar instructions, e.g. cmp+mov, when the branch 
is very one-sided,

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2685865807

Reply via email to