On Mon, 12 Jan 2026 07:23:39 GMT, Shawn M Emery <[email protected]> wrote:
>>> > Better to align loop sarting address to OptoLoopAlignment >>> >>> For parity, should I do this for the other labels in the file as well? >>> >>> > I will run the micro benchmark on AMD Turin and report back by early next >>> > week. >>> >>> That would be great, thank you for doing this! >> >> Here are the score on Turin. >> >> >> Baseline: >> Benchmark (algorithm) (keyLength) >> (provider) Mode Cnt Score Error Units >> KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-512 0 >> thrpt 2 62235.790 ops/s >> KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-768 0 >> thrpt 2 38238.390 ops/s >> KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-1024 0 >> thrpt 2 24725.512 ops/s >> >> Withopt: >> Benchmark (algorithm) (keyLength) >> (provider) Mode Cnt Score Error Units >> KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-512 0 >> thrpt 2 62483.697 ops/s >> KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-768 0 >> thrpt 2 38464.272 ops/s >> KeyPairGeneratorBench.MLKEM.generateKeyPair ML-KEM-1024 0 >> thrpt 2 24702.044 ops/s >> >> >> >> Baseline: >> Benchmark (algorithm) (provider) Mode Cnt Score Error >> Units >> KEMBench.decapsulate ML-KEM-512 thrpt 2 46416.479 >> ops/s >> KEMBench.decapsulate ML-KEM-768 thrpt 2 28516.289 >> ops/s >> KEMBench.decapsulate ML-KEM-1024 thrpt 2 19250.020 >> ops/s >> KEMBench.encapsulate ML-KEM-512 thrpt 2 60374.724 >> ops/s >> KEMBench.encapsulate ML-KEM-768 thrpt 2 36226.100 >> ops/s >> KEMBench.encapsulate ML-KEM-1024 thrpt 2 23656.223 >> ops/s >> >> Withopt: >> Benchmark (algorithm) (provider) Mode Cnt Score Error >> Units >> KEMBench.decapsulate ML-KEM-512 thrpt 2 46730.153 >> ops/s >> KEMBench.decapsulate ML-KEM-768 thrpt 2 28650.349 >> ops/s >> KEMBench.decapsulate ML-KEM-1024 thrpt 2 19390.927 >> ops/s >> KEMBench.encapsulate ML-KEM-512 thrpt 2 60238.211 >> ops/s >> KEMBench.encapsulate ML-KEM-768 thrpt 2 36454.138 >> ops/s >> KEMBench.encapsulat... > > Thank you for sharing these results. It is disconcerting to see the drop in > performance for i) key gen-1024, ii) encapsulation-512, and iii) > enacapsulation-1024, though I don't know the SE for these runs. During my > testing on a AMD EPYC 9J14 96-Core Processor I consistently get noticeable > performance increases for all ML-KEM operations: > > [Publish ML_KEM Benchmarks - > Sheet1.pdf](https://github.com/user-attachments/files/24559070/Publish.ML_KEM.Benchmarks.-.Sheet1.pdf) Here are results comparing pre and post OptoLoopAlignment: [Alignment ML_KEM Benchmarks - Sheet1.pdf](https://github.com/user-attachments/files/24607923/Alignment.ML_KEM.Benchmarks.-.Sheet1.pdf) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2689366713
