On Thu, 8 Jan 2026 17:59:35 GMT, Shawn M Emery <[email protected]> wrote:

>> This change allows use of the AVX512_VBMI instruction set to further 
>> optimize decompression/parsing of polynomial coefficients for ML-KEM.  The 
>> speedup gained in the ML-KEM benchmarks for key generation is between 0.3 to 
>> 0.6%, encapsulation is  0.4 to 1.7%, and decapsulation is 0.3 to 1.9%.
>> 
>> Thank you to @sviswa7 and @ferakocz for their help in working through the 
>> early stages of this code with me.
>
> Shawn M Emery has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 10 additional 
> commits since the last revision:
> 
>  - Merge with mainline
>  - 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI
>    Change Swap to Dup named function/variable
>    Check for only VBMI support (not VBMI2)
>  - Update copyright year
>  - Merge with mainline
>  - Swap parameter operation with source
>  - Remove wrong mask from evpsrlvw
>  - Reverse ordering for vpermb and vpsrlvw instructions
>  - Switch from vpshldvw to vpsrlvw
>  - Fix whitespaces
>  - 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI 
> and AVX512_VBMI2

src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 876:

> 874:     __ evmovdquq(xmm22, Address(perms), Assembler::AVX_512bit);
> 875: 
> 876:     __ BIND(VBMILoop);

Better to align loop sarting address to OptoLoopAlignment

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2678272848

Reply via email to