On Thu, 6 Mar 2025 17:37:33 GMT, Ferenc Rakoczi <d...@openjdk.org> wrote:
>> By using the AVX-512 vector registers the speed of the computation of the >> ML-DSA algorithms (key generation, document signing, signature verification) >> can be approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally with one additional > commit since the last revision: > > Accepted review comments. src/hotspot/cpu/x86/stubGenerator_x86_64.hpp line 494: > 492: address generate_sha3_implCompress(StubGenStubId stub_id); > 493: > 494: address generate_double_keccak(); you can hide internal helper functions (i.e. `montmulEven(*)`) if you wish. The trick is to add `MacroAssembler* _masm` as a parameter to the static (local) function. Its a trick I use to keep header clean, but still have plenty of helpers src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 409: > 407: __ evmovdquq(xmm29, Address(permsAndRots, 768), Assembler::AVX_512bit); > 408: __ evmovdquq(xmm30, Address(permsAndRots, 832), Assembler::AVX_512bit); > 409: __ evmovdquq(xmm31, Address(permsAndRots, 896), Assembler::AVX_512bit); Matter of taste, but I liked the compactness of montmulEven; i.e. for (i=0; i<15; i++) __ evmovdquq(xmm(17+i), Address(permsAndRots, 64*i), Assembler::AVX_512bit); src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 426: > 424: __ subl( roundsLeft, 1); > 425: > 426: __ evmovdquw(xmm5, xmm0, Assembler::AVX_512bit); Is there a pattern here; that can be 'compacted' into a loop? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1983903347 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1983935964 PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r1983937154