On Sat, 28 Feb 2026 02:21:14 GMT, xinyangwu <[email protected]> wrote:
>> ### Summary >> This PR introduces a parallel intrinsic for AES/ECB operations to replace >> the current per-block processing approach, reducing native call overhead and >> improving throughput for multi-block operations. >> ### Problem >> Except supporting AVX512, The existing AES/ECB/PKCS5Padding implementation >> suffers from three major performance issues: >> 1. Excessive stub call overhead: Each 16-byte block requires a separate >> intrinsic call, resulting in high invocation frequency >> >> 2. Inefficient instruction-level parallelism: The serialized block >> processing fails to fully utilize instruction-level parallelism >> >> 3. Redundant setup/teardown: Repeated initialization of encryption state for >> each block >> ### Changes >> Added parallel AES intrinsic implementation >> ### Testing >> JMH benchmarks >> >> It can bring about a **37.43%** performance improvement. >> >> On a Intel(R) Core(TM) i9-14900HX CPU machine with origin implements: >> >> >> Benchmark Mode Cnt Score Error Units >> AesTest.test avgt 5 11518.846 ± 68.621 ns/op >> >> >> On the same machine with optimized implements: >> >> >> Benchmark Mode Cnt Score Error Units >> AesTest.test avgt 5 8381.499 ± 57.751 ns/op >> >> >> All Tier-1 tests pass on linux-x64. This modification does not involve >> changing the encryption or decryption logic. > > xinyangwu has updated the pull request with a new target base due to a merge > or a rebase. The incremental webrev excludes the unrelated changes brought in > by the merge/rebase. The pull request contains nine additional commits since > the last revision: > > - refactor > - Merge branch 'openjdk:master' into aes > - 8376164: Optimize AES/ECB/PKCS5Padding implementation using full-message > intrinsic stub and parallel RoundKey addition > - Merge branch 'openjdk:master' into aes > - 8376164: Optimize AES/ECB/PKCS5Padding implementation using full-message > intrinsic stub and parallel RoundKey addition > - Merge branch 'openjdk:master' into aes > - Merge branch 'openjdk:master' into aes > - Merge branch 'openjdk:master' into aes > - 8376164: Optimize AES/ECB/PKCS5Padding with parallel intrinsic Thank you for refactoring, the code looks much better! Benchmarks after refactoring has come back with negligible changes in performance from the original contribution. AES and security related regression tests have passed with no unknown failures. Unit testing with the Knights Landing processor setting enabled on applicable Xeon systems running Windows have also passed. One last note/question: why is PKCS5 padding called out in the synopsis specifically, given that these changes also affect the no padding version? src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 1459: > 1457: > 1458: #define DoOne(opc, reg) \ > 1459: __ opc(xmm_result0, reg); \ nit: trailing backslashes src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 1601: > 1599: > 1600: #define DoOne(opc, reg) \ > 1601: __ opc(xmm_result0, reg); \ nit: trailing backslashes ------------- Marked as reviewed by semery (Author). PR Review: https://git.openjdk.org/jdk/pull/29385#pullrequestreview-3871686368 PR Review Comment: https://git.openjdk.org/jdk/pull/29385#discussion_r2868603522 PR Review Comment: https://git.openjdk.org/jdk/pull/29385#discussion_r2868603932
