On Tue, 7 Feb 2023 00:12:21 GMT, Scott Gibbons <d...@openjdk.org> wrote:
>> Added code for Base64 acceleration (encode and decode) which will accelerate >> ~4x for AVX2 platforms. >> >> Encode performance: >> **Old:** >> >> Benchmark (maxNumBytes) Mode Cnt Score Error >> Units >> Base64Encode.testBase64Encode 1024 thrpt 3 4309.439 ± 2.632 >> ops/ms >> >> >> **New:** >> >> Benchmark (maxNumBytes) Mode Cnt Score >> Error Units >> Base64Encode.testBase64Encode 1024 thrpt 3 24211.397 ± >> 102.026 ops/ms >> >> >> Decode performance: >> **Old:** >> >> Benchmark (errorIndex) (lineSize) (maxNumBytes) >> Mode Cnt Score Error Units >> Base64Decode.testBase64Decode 144 4 1024 >> thrpt 3 3961.768 ± 93.409 ops/ms >> >> **New:** >> Benchmark (errorIndex) (lineSize) (maxNumBytes) >> Mode Cnt Score Error Units >> Base64Decode.testBase64Decode 144 4 1024 >> thrpt 3 14738.051 ± 24.383 ops/ms > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Add algorithm comments src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2720: > 2718: __ vpshufb(xmm5, xmm9, xmm1, Assembler::AVX_256bit); > 2719: // If the and of the two is non-zero, we have an invalid input > character > 2720: __ vptest(xmm3, xmm5); For isURL, it looks to me that the vptest will fail for URL valid input 0x5F ("_"): upper_nibble = 0x5; lower_nibble = 0xF; lut_lo_URL = 0x1B; (corresponding to 0xF) lut_hi = 0x8; (corresponding to 0x5) lut_lo_URL & lut_hi = 0x8; (not zero, taken as not allowable and so exit from loop) Could you please verify on your end and fix this? My understanding is that this is happening because 5 and 7 upper nibble get the same encoding 0x8. ------------- PR: https://git.openjdk.org/jdk/pull/12126