On Wed, 15 Mar 2023 14:23:35 GMT, Eirik Bjorsnos <d...@openjdk.org> wrote:
>> Many thanks to have tried, yep, I was curious indeed re the >> "StringLatin1.canEncode regression" case. >> I would still modify the benchmark to use inputs (I know that will make it >> memory bound sadly, due to reading inputs - but the size of such inputs can >> be a benchmark parameter, together with the bias eg "latin","mix", >> "non-latin") "semi-randomly" generated based on the mentioned >> strategies/biases. >> It will benefit future tests on this, although could be provided as a >> separate PR. > >> The StringLatin1.canEncode regression disappears. > > I mixed things up so StringLatin1.canEncode was benchmarked without the > updated code. > > Here are updated benchmark results: > > > Baseline: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigitRandom 1632 avgt 15 5.437 ± 0.235 ns/op > > > PR: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigitRandom 1632 avgt 15 5.319 ± 0.341 ns/op > > > StringLatin1.canEncode: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigitRandom 1632 avgt 15 5.447 ± 0.304 ns/op > ``` > > So it seems using StringLatin1.canEncode still might have a regression also > in the randomized input case. > > For this PR, I suggest we update StringLatin1.canEncode to be in sync with > CharacterData.of, without one calling the other. If anyone wants to > investigate the regression further, than can be done outside this PR. > > I have independently verified that StringLatin1.canEncode sees performance > improvements using the StringIndexOf benchmark. We can do `Integer.compareUnsigned(ch, 0xFF) <= 0` ------------- PR: https://git.openjdk.org/jdk/pull/13040