On Mon, 9 Mar 2026 19:36:05 GMT, Mohamed Issa <[email protected]> wrote:

>> Although the scalar AVX10 floating point min/max instructions (VMINMAXSD, 
>> VMINMAXSS, VMINMAXSH) are compact, it's better not to use them in reduction 
>> loops. This is because of serial data dependencies that get triggered across 
>> loop iterations. An alternate implementation using comparisons and jumps 
>> leverages branch prediction and limits the effects of data dependencies to 
>> cheaper instructions (e.g, MOV). Please note that this method is already 
>> used for non-AVX10 min/max reduction loop scenarios.
>> 
>> With that background provided, these changes remove AVX10 floating point 
>> min/max instructions from single and double precision floating point 
>> reduction loops. They are replaced by the separate instruction sequence 
>> described above. Currently, min/max half precision floating point reduction 
>> loops aren't detectable, so they will be handled in a separate PR. There is 
>> also some code cleanup to remove unused instruction definitions while also 
>> adding necessary supporting infrastructure. The JTREG tests listed below 
>> were used to verify correctness with the recommended JVM options mentioned 
>> in corresponding source files. All modifications and tests used [OpenJDK 
>> v27-b12](https://github.com/openjdk/jdk/releases/tag/jdk-27%2B12) as the 
>> baseline build.
>> 
>> 1. `jtreg:test/jdk/jdk/incubator/vector/DoubleVector64Tests.java`
>> 2. `jtreg:test/jdk/jdk/incubator/vector/DoubleVector128Tests.java`
>> 3. `jtreg:test/jdk/jdk/incubator/vector/DoubleVector256Tests.java`
>> 4. `jtreg:test/jdk/jdk/incubator/vector/DoubleVector512Tests.java`
>> 5. `jtreg:test/jdk/jdk/incubator/vector/DoubleVectorMaxTests.java`
>> 6. `jtreg:test/jdk/jdk/incubator/vector/FloatVector64Tests.java`
>> 7. `jtreg:test/jdk/jdk/incubator/vector/FloatVector128Tests.java`
>> 8. `jtreg:test/jdk/jdk/incubator/vector/FloatVector256Tests.java`
>> 9. `jtreg:test/jdk/jdk/incubator/vector/FloatVector512Tests.java`
>> 10. `jtreg:test/jdk/jdk/incubator/vector/FloatVectorMaxTests.java`
>> 11. 
>> `jtreg:test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java`
>> 12. 
>> `jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java`
>> 13. `jtreg:test/hotspot/jtreg/compiler/igvn/TestMinMaxIdentity.java`
>> 14. 
>> `jtreg:test/hotspot/jtreg/compiler/intrinsics/float16/TestFloat16MaxMinSpecialValues.java`
>> 
>> Finally, the JMH micro-benchmarks listed below were updated to ensure all 
>> code paths are exercised.
>> 
>> 1. 
>> `micro:test/micro/org/openjdk/bench/jdk/incubator/vector/Float16OperationsBenchmark.java`
>> 2. `mi...
>
> Mohamed Issa has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now contains five commits:
> 
>  - Merge branch 'master' into user/missa-prime/avx10_2
>  - Remove half precision min/max reduction definitions and adjust 
> corresponding benchmarks.
>  - Use alternative instruction flow for half precision reduction loops and 
> add supporting infrastructure.
>  - Merge branch 'master' into user/missa-prime/avx10_2
>  - Replace scalar AVX10.2 floating point min/max instructions with more 
> efficient sequence

src/hotspot/cpu/x86/x86.ad line 1743:

> 1741: // Math.min()        # Math.max()
> 1742: // -----------------------------
> 1743: // (v)ucomis[s/d].   #

Add sh to the comment here.

src/hotspot/cpu/x86/x86.ad line 1763:

> 1761:   } else {
> 1762:     emit_fp_ucom_double(masm, a, b);
> 1763:   }

It would be good to have a function like emit_fp_ucom(masm, pt, a, b) and use 
that here.

src/hotspot/cpu/x86/x86.ad line 1791:

> 1789:   } else {
> 1790:     __ movdbl(dst, a);
> 1791:   }

Likewise a function movfp(prec, dst, src) would be good to define and use here.

src/hotspot/cpu/x86/x86.ad line 7408:

> 7406: 
> 7407: // max = java.lang.Math.max(float a, float b)
> 7408: instruct maxF_reg_avx10_2(regF dst, regF a, regF b)

We can merge the maxF_reg_avx10_2 and minF_reg_avx10_2 into one instruct say 
minmaxF_reg_avx10_2 with two match rules:
 match(Set dst (MaxF a b));
 match(Set dst (MinF a b));
Likewise for double minmax.

src/hotspot/cpu/x86/x86.ad line 7420:

> 7418: %}
> 7419: 
> 7420: instruct maxF_reduction_reg_avx10_2(regF dst, regF a, regF b, regF 
> xtmp, rRegI rtmp, rFlagsReg cr)

We can merge the maxF_reduction_reg_avx10_2 and minF_reduction_reg_avx10_2 into 
one instruct say minmaxF_reduction_reg_avx10_2 with two match rules:
match(Set dst (MaxF a b));
match(Set dst (MinF a b));
Likewise for double minmax.

src/hotspot/cpu/x86/x86.ad line 7435:

> 7433: 
> 7434: // max = java.lang.Math.max(float a, float b)
> 7435: instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, 
> legRegF atmp, legRegF btmp)

We can merge the maxF_reg and minF_reg into one instruct say minmaxF_reg with 
two match rules:
match(Set dst (MaxF a b));
match(Set dst (MinF a b));
Likewise for double minmax.

src/hotspot/cpu/x86/x86.ad line 7448:

> 7446: %}
> 7447: 
> 7448: instruct maxF_reduction_reg(legRegF dst, legRegF a, legRegF b, legRegF 
> xtmp, rRegI rtmp, rFlagsReg cr)

We can merge the maxF_reduction_reg and minF_reduction_reg into one instruct 
say minmaxF_reduction_reg with two match rules:
match(Set dst (MaxF a b));
match(Set dst (MinF a b));

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/29831#discussion_r2932578996
PR Review Comment: https://git.openjdk.org/jdk/pull/29831#discussion_r2932554904
PR Review Comment: https://git.openjdk.org/jdk/pull/29831#discussion_r2932564600
PR Review Comment: https://git.openjdk.org/jdk/pull/29831#discussion_r2932617343
PR Review Comment: https://git.openjdk.org/jdk/pull/29831#discussion_r2932623330
PR Review Comment: https://git.openjdk.org/jdk/pull/29831#discussion_r2932640106
PR Review Comment: https://git.openjdk.org/jdk/pull/29831#discussion_r2932642570

Reply via email to