On Thu, 9 Feb 2023 18:08:15 GMT, Scott Gibbons <sgibb...@openjdk.org> wrote:
>> Added code for Base64 acceleration (encode and decode) which will accelerate >> ~4x for AVX2 platforms. >> >> Encode performance: >> **Old:** >> >> Benchmark (maxNumBytes) Mode Cnt Score Error >> Units >> Base64Encode.testBase64Encode 1024 thrpt 3 4309.439 ± 2.632 >> ops/ms >> >> >> **New:** >> >> Benchmark (maxNumBytes) Mode Cnt Score >> Error Units >> Base64Encode.testBase64Encode 1024 thrpt 3 24211.397 ± >> 102.026 ops/ms >> >> >> Decode performance: >> **Old:** >> >> Benchmark (errorIndex) (lineSize) (maxNumBytes) >> Mode Cnt Score Error Units >> Base64Decode.testBase64Decode 144 4 1024 >> thrpt 3 3961.768 ± 93.409 ops/ms >> >> **New:** >> Benchmark (errorIndex) (lineSize) (maxNumBytes) >> Mode Cnt Score Error Units >> Base64Decode.testBase64Decode 144 4 1024 >> thrpt 3 14738.051 ± 24.383 ops/ms > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Add URL to microbenchmark src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2399: > 2397: VM_Version::supports_avx512bw()) { > 2398: __ cmpl(length, 31); // 32-bytes is break-even for AVX-512 > 2399: __ jcc(Assembler::lessEqual, L_bruteForce); The avx2 code needs the length to be atleast 0x2c (44) bytes. We could directly go to non-avx code instead of L_bruteForce here. We will save one subtract/branch. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2658: > 2656: // Check for buffer too small (for algorithm) > 2657: __ subl(length, 0x2c); > 2658: __ jcc(Assembler::lessEqual, L_tailProc); This could be Assembler::less instead of Assembler::lessEqual. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2699: > 2697: __ addptr(dest, 0x18); > 2698: __ subl(length, 0x20); > 2699: __ jcc(Assembler::lessEqual, L_tailProc); This could be Assembler::less instead of Assembler::lessEqual. ------------- PR: https://git.openjdk.org/jdk/pull/12126