On Mon, 23 Jan 2023 11:58:58 GMT, Claes Redestad <redes...@openjdk.org> wrote:

>> Added code for Base64 acceleration (encode and decode) which will accelerate 
>> ~4x for AVX2 platforms.
>> 
>> Encode performance:
>> **Old:**
>> 
>> Benchmark                      (maxNumBytes)   Mode  Cnt     Score   Error   
>> Units
>> Base64Encode.testBase64Encode           1024  thrpt    3  4309.439 ± 2.632  
>> ops/ms
>> 
>> 
>> **New:**
>> 
>> Benchmark                      (maxNumBytes)   Mode  Cnt      Score     
>> Error   Units
>> Base64Encode.testBase64Encode           1024  thrpt    3  24211.397 ± 
>> 102.026  ops/ms
>> 
>> 
>> Decode performance:
>> **Old:**
>> 
>> Benchmark                      (errorIndex)  (lineSize)  (maxNumBytes)   
>> Mode  Cnt     Score    Error   Units
>> Base64Decode.testBase64Decode           144           4           1024  
>> thrpt    3  3961.768 ± 93.409  ops/ms
>> 
>> **New:**
>> Benchmark                      (errorIndex)  (lineSize)  (maxNumBytes)   
>> Mode  Cnt      Score    Error   Units
>> Base64Decode.testBase64Decode           144           4           1024  
>> thrpt    3  14738.051 ± 24.383  ops/ms
>
> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2661:
> 
>> 2659:     __ vpbroadcastq(xmm4, Address(r13, 0), Assembler::AVX_256bit);
>> 2660:     __ vmovdqu(xmm11, Address(r13, 0x28));
>> 2661:     __ vpbroadcastb(xmm10, Address(r13, 0), Assembler::AVX_256bit);
> 
> Sorry in advance since I'm probably reading this wrong: the data that `r13` 
> is pointing to appears to be a repeated byte pattern (`0x2f2f2f...`), does 
> this mean this `vpbroadcastb` and the `vpbroadcastq` above end up filling up 
> their respective registers with the exact same bits? If so, and since neither 
> of them is mutated in the code below, then perhaps this can be simplified a 
> bit.

You're reading it correctly - this is redundant and could be handled 
differently, as the same value is being loaded into ymm4 and ymm10.  I don't 
think there will be any significant performance gain either way.  This was done 
in this manner to allow easier transition to URL acceleration when it is 
implemented, as URLs require handling '-' and '_' instead of '+' and '/' ('/' = 
0x2f).

-------------

PR: https://git.openjdk.org/jdk/pull/12126

Reply via email to