On Fri, 9 Jan 2026 22:59:16 GMT, Mohamed Issa <[email protected]> wrote:
>> Intel® AVX10 ISA [1] extensions added new floating point comparison >> instructions. They set the EFLAGS register so that relationships can be >> tested independently to avoid extra checks when one of the inputs is NaN. >> >> Most of the work is covered in the architecture definition (`x86.ad`) file. >> A new comparison operand was created to be used by new CMove and JMP >> definitions with the APX specific portions of the CMove section being >> updated to rely on the new instructions because both sets of instructions >> are always expected to be available on the same platform. New floating point >> comparison definitions were also added. >> >> This change uses the new AVX10.2 (UCOMXSS or UCOMXSD) instructions on >> supported platforms to avoid the extra handling required with existing >> (UCOMISS or UCOMISD) instructions. To make sure no new failures were >> introduced, tier1, tier2, and tier3 tests were run on builds with and >> without the changes. Additionally, the JTREG tests listed below were used to >> verify correctness with `-XX:-UseAPX` / `-XX:+UseAPX` options. The baseline >> build used is [OpenJDK >> v26-b26](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B26). >> >> 1. `jtreg:test/hotspot/jtreg/compiler/c2/irTests/CMoveLConstants.java` >> 2. `jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison.java` >> 3. >> `jtreg:test/hotspot/jtreg/compiler/intrinsics/math/TestSignumIntrinsic.java` >> 4. `jtreg:test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java` >> >> Finally, the JMH micro-benchmark listed below was updated to separately >> exercise CMove and JMP code paths. >> >> 1. `micro:test/micro/org/openjdk/bench/java/lang/FPComparison.java` >> >> [1] >> https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10 > > Mohamed Issa has updated the pull request incrementally with one additional > commit since the last revision: > > Remove extra jump instruction in signum floating point and unify three-way > floating point comparison logic in x86.ad src/hotspot/cpu/x86/x86.ad line 1706: > 1704: Label done; > 1705: __ setcc(Assembler::above, dst); > 1706: __ jcc(Assembler::aboveEqual, done); It would be better to keep the original sequence with optimization: __ movl(dst, -1); __ jcc(Assembler::below, done); __ setcc(Assembler::notEqual, dst); This would work for both AVX10_2 and prior as the Assembler::below covers both unordered and less. src/hotspot/cpu/x86/x86.ad line 9186: > 9184: %} > 9185: > 9186: instruct cmovI_regUCFE(cmpOpUCFE cop, rFlagsRegUCFE cr, rRegI dst, > rRegI src) %{ The following instructs could be removed and only the ndd version need to be kept: cmovI_regUCFE cmovI_memUCFE cmovN_regUCFE cmovP_regUCFE cmovL_regUCFE cmovL_memUCFE cmpOpUCFE/rFlagsRegUCFE should only match when both AVX10_2 and APX is available otherwise fallback to cmpOpUCF/cmpOfUCF2/rFlagsRegUCF. We can then remove the predicates from cmovI_reg/cmovI_regUCFE_ndd etc. This is to avoid explosion of ad file instructs for various combination of APX and AVX10_2. src/hotspot/cpu/x86/x86.ad line 14609: > 14607: ins_encode %{ > 14608: __ ucomxss($src1$$XMMRegister, $src2$$XMMRegister); > 14609: emit_cmpfp3(masm, $dst$$Register); It looks to me that the only difference between cmpF_reg and cmpF_regE is ucomxss vs ucomiss. In which case it is better to keep only the original cmpF_reg and remove cmpF_regE. Likewise for cmpF_memE, cmpF_immE, cmpD_regE, cmpD_memE, cmpD_immE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28337#discussion_r2696383810 PR Review Comment: https://git.openjdk.org/jdk/pull/28337#discussion_r2696362925 PR Review Comment: https://git.openjdk.org/jdk/pull/28337#discussion_r2684261529
