Re: RFR: 8364320: String encodeUTF8 latin1 with negatives

Brett Okken Mon, 11 Aug 2025 06:35:49 -0700

On Fri, 1 Aug 2025 16:12:46 GMT, Chen Liang <li...@openjdk.org> wrote:


>> Benchmark on win64
>> 
>> Baseline:
>> 
>> 
>> Benchmark                           (charsetName)  Mode  Cnt      Score     
>> Error  Units
>> StringEncode.encodeAllMixed                 UTF-8  avgt   10  20067.519 ┬▒ 
>> 528.152  ns/op
>> StringEncode.encodeAsciiLong                UTF-8  avgt   10  12115.389 ┬▒ 
>> 307.491  ns/op
>> StringEncode.encodeAsciiShort               UTF-8  avgt   10     70.098 ┬▒   
>> 1.696  ns/op
>> StringEncode.encodeLatin1LongEnd            UTF-8  avgt   10   1974.391 ┬▒ 
>> 162.405  ns/op
>> StringEncode.encodeLatin1LongOnly           UTF-8  avgt   10    270.097 ┬▒  
>> 13.840  ns/op
>> StringEncode.encodeLatin1LongStart          UTF-8  avgt   10   1876.366 ┬▒  
>> 51.971  ns/op
>> StringEncode.encodeLatin1Mixed              UTF-8  avgt   10   4973.070 ┬▒ 
>> 130.426  ns/op
>> StringEncode.encodeLatin1Short              UTF-8  avgt   10     96.227 ┬▒   
>> 2.816  ns/op
>> StringEncode.encodeShortMixed               UTF-8  avgt   10    360.586 ┬▒   
>> 8.691  ns/op
>> StringEncode.encodeUTF16LongEnd             UTF-8  avgt   10   1534.748 ┬▒  
>> 34.584  ns/op
>> StringEncode.encodeUTF16LongOnly            UTF-8  avgt   10    528.919 ┬▒  
>> 15.143  ns/op
>> StringEncode.encodeUTF16LongStart           UTF-8  avgt   10   2275.117 ┬▒  
>> 50.152  ns/op
>> StringEncode.encodeUTF16Mixed               UTF-8  avgt   10   4398.943 ┬▒ 
>> 116.607  ns/op
>> StringEncode.encodeUTF16Short               UTF-8  avgt   10    152.219 ┬▒   
>> 8.677  ns/op
>> 
>> 
>> 
>> Patch:
>> 
>> Benchmark                           (charsetName)  Mode  Cnt      Score     
>> Error  Units
>> StringEncode.encodeAllMixed                 UTF-8  avgt   10  18876.056 ┬▒ 
>> 330.644  ns/op
>> StringEncode.encodeAsciiLong                UTF-8  avgt   10  12040.590 ┬▒ 
>> 165.905  ns/op
>> StringEncode.encodeAsciiShort               UTF-8  avgt   10     69.895 ┬▒   
>> 0.318  ns/op
>> StringEncode.encodeLatin1LongEnd            UTF-8  avgt   10    574.455 ┬▒  
>> 14.769  ns/op
>> StringEncode.encodeLatin1LongOnly           UTF-8  avgt   10    284.553 ┬▒   
>> 1.886  ns/op
>> StringEncode.encodeLatin1LongStart          UTF-8  avgt   10   2230.789 ┬▒  
>> 11.043  ns/op
>> StringEncode.encodeLatin1Mixed              UTF-8  avgt   10   3278.998 ┬▒  
>> 96.779  ns/op
>> StringEncode.encodeLatin1Short              UTF-8  avgt   10     99.332 ┬▒   
>> 1.977  ns/op
>> StringEncode.encodeShortMixed               UTF-8  avgt   10    378.183 ┬▒  
>> 17.504  ns/op
>> StringEncode.encodeUTF16LongEnd             UTF-8  avgt   10   1531.960 ┬▒  
>> 19.300  ns/op
>> StringEncode.encodeUTF16LongOnly            U...
>
> @bokken FYI to make JMH comparison easier, you can let JMH generate JSON 
> reports, upload them to github gists, and use https://jmh.morethan.io/ to 
> compare the two results from two gists.

@liach / @RogerRiggs I have been experimenting locally with other options which 
are a bit more complex:
https://github.com/bokken/jdk/commits/string-utf8-mincopylength/
This seems like maybe a decent balance of complexity vs gain: 
https://github.com/bokken/jdk/commit/ee9d9e3496052fd5084f989bd7181504989d812b

I am continuing to evaluate various options.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26597#issuecomment-3174871115

Re: RFR: 8364320: String encodeUTF8 latin1 with negatives

Reply via email to