Re: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v13]

Sandhya Viswanathan Mon, 07 Apr 2025 17:13:19 -0700

On Wed, 2 Apr 2025 07:38:34 GMT, Ferenc Rakoczi <[email protected]> wrote:


>> By using the AVX-512 vector registers the speed of the computation of the 
>> ML-DSA algorithms (key generation, document signing, signature verification) 
>> can be approximately doubled.
>
> Ferenc Rakoczi has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Reacting to comment by Sandhya.

src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 802:

> 800:   __ evpbroadcastd(zero, scratch, Assembler::AVX_512bit); // 0
> 801:   __ addl(scratch, 1);
> 802:   __ evpbroadcastd(one, scratch, Assembler::AVX_512bit); // 1

A better way to initialize (0, 1, -1) vectors is:
// load 0 into int vector
vpxor(zero, zero, zero, Assembler::AVX_512bit);
// load -1 into int vector
vpternlogd(minusOne, 0xff, minusOne, minusOne, Assembler::AVX_512bit);
// load 1 into int vector
vpsubd(one, zero, minusOne, Assembler::AVX_512bit);

Where minusOne could be xmm31. 

A broadcast from r register to xmm register is more expensive.

src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 982:

> 980:   __ evporq(xmm19, k0, xmm19, xmm23, false, Assembler::AVX_512bit);
> 981: 
> 982:   __ evpsubd(xmm12, k0, zero, one, false, Assembler::AVX_512bit); // -1

The -1 initialization could be done outside the loop.

src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 1015:

> 1013:   __ addptr(lowPart, 4 * XMMBYTES);
> 1014:   __ cmpl(len, 0);
> 1015:   __ jcc(Assembler::notEqual, L_loop);

It looks to me that subl and cmpl could be merged:
  __ addptr(highPart, 4 * XMMBYTES);
  __ addptr(lowPart, 4 * XMMBYTES);
  __ subl(len, 4 * XMMBYTES);
  __ jcc(Assembler::notEqual, L_loop);

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2032172061
PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2032171059
PR Review Comment: https://git.openjdk.org/jdk/pull/23860#discussion_r2031979828

Re: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v13]

Reply via email to