On Wed, 13 Sep 2023 20:25:22 GMT, Smita Kamath <svkam...@openjdk.org> wrote:
>> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using >> AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:UseAVX=2 >> option: >> >> |Benchmark | Data Size | Base version (ops/s) | Patched version (ops/s) | >> Speedup >> |-------------|------------|---------------|------------------|-----------| >> |full.AESGCMBench.decrypt | 8192 | 526274.678 | 670014.543 | 1.27 >> full.AESGCMBench.encrypt | 8192 | 538293.315 | 680716.207 | 1.26 >> small.AESGCMBench.decrypt | 8192 | 527854.353 |663131.48 | 1.25 >> small.AESGCMBench.encrypt | 8192 | 548193.804 | 683624.232 |1.24 >> full.AESGCMBench.decryptMultiPart | 8192 | 299865.766 | 299815.851 | 0.99 >> full.AESGCMBench.encryptMultiPart | 8192 | 534406.564 |539235.462 | 1.00 >> small.AESGCMBench.decryptMultiPart | 8192 | 299960.202 |298913.629 | 0.99 >> small.AESGCMBench.encryptMultiPart | 8192 | 542669.258 | 540552.293 | 0.99 >> | | | | >> full.AESGCMBench.decrypt | 16384 | 307266.364 |390397.778 | 1.27 >> full.AESGCMBench.encrypt | 16384 | 311491.901 | 397279.681 | 1.27 >> small.AESGCMBench.decrypt | 16384 | 306257.801 | 389531.665 |1.27 >> small.AESGCMBench.encrypt | 16384 | 311468.972 | 397804.753 | 1.27 >> full.AESGCMBench.decryptMultiPart | 16384 | 159634.341 | 181271.487 | 1.13 >> full.AESGCMBench.encryptMultiPart | 16384 | 308980.992 | 385606.113 | 1.24 >> small.AESGCMBench.decryptMultiPart | 16384 | 160476.064 |181019.205 | 1.12 >> small.AESGCMBench.encryptMultiPart | 16384 | 308382.656 | 391126.417 | 1.26 >> | | | | >> full.AESGCMBench.decrypt | 32768 | 162284.703 | 213257.481 |1.31 >> full.AESGCMBench.encrypt | 32768 | 164833.104 | 215568.639 | 1.30 >> small.AESGCMBench.decrypt | 32768 | 164416.491 | 213422.347 | 1.29 >> small.AESGCMBench.encrypt | 32768 | 166619.205 | 214584.208 |1.28 >> full.AESGCMBench.decryptMultiPart | 32768 | 83306.239 | 93762.988 |1.12 >> full.AESGCMBench.encryptMultiPart | 32768 | 166109.391 |211701.969 | 1.27 >> small.AESGCMBench.decryptMultiPart | 32768 | 83792.559 | 94530.786 | 1.12 >> small.AESGCMBench.encryptMultiPart | 32768 | 162975.904 |212085.047 | 1.30 >> | | | | >> full.AESGCMBench.decrypt | 65536 | 85765.835 | 112244.611 | 1.30 >> full.AESGCMBench.encrypt | 65536 | 86471.805 | 113320.536 |1.31 >> small.AESGCMBench.decrypt | 65536 | 84490.816 | 112122.358 |1.32 >> small.AESGCMBench.encrypt | 65536 | 85403.025 | 112741.811 | 1.32 >> full.AES... > > Smita Kamath has updated the pull request incrementally with one additional > commit since the last revision: > > Removed isEncrypt boolean variable src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 3627: > 3625: __ cmpl(rounds, 52); > 3626: __ jcc(Assembler::greaterEqual, aes_192); > 3627: __ jmp(last_aes_rnd); Could be replaced with __ jcc(Assembler::below, last_aes_rnd); src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 3649: > 3647: __ cmpl(rounds, 60); > 3648: __ jcc(Assembler::aboveEqual, aes_256); > 3649: __ jmp(last_aes_rnd); Could be replaced with __ jcc(Assembler::below, last_aes_rnd); src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 4199: > 4197: //The entire message was encrypted processed in initial and now need > to be hashed > 4198: __ cmpl(len, 0); > 4199: __ jcc(Assembler::equal, encrypt_done); We should check for len to be atleast 128 here as the block following processes 128 bytes: __ cmpl(len, 128); __ jcc(Assembler::less, encrypt_done); src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 4241: > 4239: __ jcc(Assembler::equal, encrypt_done); > 4240: > 4241: __ bind(encrypt_done); This is a fall through case: __ cmpl(r14, 0); __ jcc(Assembler::equal, encrypt_done); The above two instructions can be removed. src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 4246: > 4244: __ bind(ghash_done); > 4245: __ movdqu(xmm15, ExternalAddress(counter_mask_linc1_addr()), rbx > /*rscratch*/); > 4246: __ vpaddd(xmm9, xmm9, xmm15, Assembler::AVX_128bit); We could do the following here: __ vpaddd(xmm9, xmm9, ExternalAddress(counter_mask_linc1_addr()), Assembler::AVX_128bit, rbx); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1334673738 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1334674168 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1334660702 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1334657499 PR Review Comment: https://git.openjdk.org/jdk/pull/15410#discussion_r1334665625