On Fri, 8 Sep 2023 10:32:40 GMT, Claes Redestad <redes...@openjdk.org> wrote:

>> This PR seeks to improve formatting of hex digits using 
>> `java.util.HexFormat` somewhat.
>> 
>> This is achieved getting rid of a couple of lookup tables, caching the 
>> result of `HexFormat.of().withUpperCase()`, and removing tiny allocation 
>> that happens in the `formatHex(A, byte)` method. Improvements range from 
>> 20-40% on throughput, and some operations allocate less:
>> 
>> 
>> Name                               Cnt   Base   Error   Test   Error   Unit  
>>  Diff%
>> HexFormatBench.appenderLower        15  1,330 ± 0,021  1,065 ± 0,067  us/op  
>>  19,9% (p = 0,000*)
>>   :gc.alloc.rate                    15 11,481 ± 0,185  0,007 ± 0,000 MB/sec  
>> -99,9% (p = 0,000*)
>>   :gc.alloc.rate.norm               15 16,009 ± 0,000  0,007 ± 0,000   B/op 
>> -100,0% (p = 0,000*)
>>   :gc.count                         15  3,000          0,000         counts
>>   :gc.time                           3  2,000                            ms
>> HexFormatBench.appenderLowerCached  15  1,317 ± 0,013  1,065 ± 0,054  us/op  
>>  19,1% (p = 0,000*)
>>   :gc.alloc.rate                    15 11,590 ± 0,111  0,007 ± 0,000 MB/sec  
>> -99,9% (p = 0,000*)
>>   :gc.alloc.rate.norm               15 16,009 ± 0,000  0,007 ± 0,000   B/op 
>> -100,0% (p = 0,000*)
>>   :gc.count                         15  3,000          0,000         counts
>>   :gc.time                           3  2,000                            ms
>> HexFormatBench.appenderUpper        15  1,330 ± 0,022  1,065 ± 0,036  us/op  
>>  19,9% (p = 0,000*)
>>   :gc.alloc.rate                    15 34,416 ± 0,559  0,007 ± 0,000 MB/sec 
>> -100,0% (p = 0,000*)
>>   :gc.alloc.rate.norm               15 48,009 ± 0,000  0,007 ± 0,000   B/op 
>> -100,0% (p = 0,000*)
>>   :gc.count                         15  0,000          0,000         counts
>> HexFormatBench.appenderUpperCached  15  1,353 ± 0,009  1,033 ± 0,014  us/op  
>>  23,6% (p = 0,000*)
>>   :gc.alloc.rate                    15 11,284 ± 0,074  0,007 ± 0,000 MB/sec  
>> -99,9% (p = 0,000*)
>>   :gc.alloc.rate.norm               15 16,009 ± 0,000  0,007 ± 0,000   B/op 
>> -100,0% (p = 0,000*)
>>   :gc.count                         15  3,000          0,000         counts
>>   :gc.time                           3  2,000                            ms
>> HexFormatBench.toHexLower           15  0,198 ± 0,001  0,119 ± 0,008  us/op  
>>  40,1% (p = 0,000*)
>>   :gc.alloc.rate                    15  0,007 ± 0,000  0,007 ± 0,000 MB/sec  
>>  -0,0% (p = 0,816 )
>>   :gc.alloc.rate.norm               15  0,001 ± 0,000  0,001 ± 0,000   B/op  
>> -40,1% (p = 0,000*)
>>   :gc....
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Add toHexDigitsByte|Short|Int|Long microbenchmarks

I ran some experiments with a lookup-table approach (based on `HexDigits` and I 
can get some of these to be marginally faster when combining a lookup-table 
approach with the `ByteArray` hack, but there's no win when using one or the 
other in isolation. So I think much of the win is actually not from using a 
lookup-table, but from tickling the JIT to inline more and optimize a bit more 
aggressively. So I think this might be a case of micros telling us sweet little 
lies, and we should favor the intuition that lookup tables should be avoided 
unless absolutely necessary.

I prefer the simplicity of this PR as it stands and think we should backtrack 
on some of the lookup tables we've recently added in 
`jdk.internal.util.Hex|Decimal|OctalDigits`.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15591#issuecomment-1718230389

Reply via email to