On Mon, 17 Nov 2025 23:35:44 GMT, Volodymyr Paprotski <[email protected]> wrote:
>> - New AVX2 intrinsics are 1.6x-6.9x faster than Java baseline >> - `SignatureBench.MLDSA` is 1.2x-2.2x faster >> - Note: there is no AVX2-SHA3 intrinsics yet (Being reviewed >> https://github.com/vpaprotsk/jdk/pull/7) >> - AVX512 intrinsic improvements are 1.24x-1.5x faster then current version >> - `SignatureBench.MLDSA` is upto 5% faster, never slower >> >> Note on intrinsic: >> - The emitted (existing) AVX512 assembler was not "significantly" changed; >> mostly more efficient instruction selection and tighter register allocation, >> which allowed removal of NTT loop and stack spill. >> - Code was refactored to allow reuse of same assembler (as possible) for >> AVX512 and AVX2 >> >> Tests and benchmarks: >> - Added a fuzz test to ensure Java and intrinsic produces exactly same result >> - Added benchmark to measure the performance of intrinsic itself >> >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java >> test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" >> make test TEST="test/jdk/sun/security/provider/acvp/Launcher.java >> test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java" >> JTREG="JAVA_OPTIONS=-XX:UseAVX=2" >> make test >> TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" >> MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions >> -XX:+UseDilithiumIntrinsics;FORK=1" >> make test >> TEST="micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA" >> MICRO="JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions >> -XX:-UseDilithiumIntrinsics;FORK=1" > > Volodymyr Paprotski has updated the pull request incrementally with two > additional commits since the last revision: > > - whitespace > - address first comments src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1283: > 1281: // r1 = r1 & quotient; // copy 0 or keep as is, using EqMsk as > filter > 1282: for (int i = 0; i < regCnt; i++) { > 1283: // FIXME: replace with void evmovdqul(Address dst, KRegister > mask, XMMRegister src, bool merge, int vector_len);? Is the fixme a leftover? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2547729185
