On Wed, 15 Mar 2023 14:31:03 GMT, Eirik Bjorsnos <d...@openjdk.org> wrote:

>> By avoiding a bit shift operation for the latin1 fast-path test, we can 
>> speed up the `java.lang.CharacterData.of` method by ~25% for latin1 code 
>> points.
>> 
>> The latin1 test is currently implemented as `ch >>> 8 == 0`.  We can replace 
>> this with `ch >= 0 && ch <= 0xFF` for a noticable performance gain 
>> (especially for Latin1 code points):
>> 
>> This method is called frequently by various property-determining methods in 
>> `java.lang.Character` like `isLowerCase`, `isDigit` etc, so one should 
>> expect improvements for all these methods.
>> 
>> Performance is tested using the `Characters.isDigit` benchmark using the 
>> digits '0' (decimal 48, in CharacterDataLatin1) and '\u0660' (decimal 1632, 
>> in CharacterData00):
>> 
>> Baseline:
>> 
>> 
>> Benchmark           (codePoint)  Mode  Cnt  Score   Error  Units
>> Characters.isDigit           48  avgt   15  0.870 ± 0.011  ns/op
>> Characters.isDigit         1632  avgt   15  2.168 ± 0.017  ns/op
>> 
>> PR:
>> 
>> 
>> Benchmark           (codePoint)  Mode  Cnt  Score   Error  Units
>> Characters.isDigit           48  avgt   15  0.654 ± 0.007  ns/op
>> Characters.isDigit         1632  avgt   15  2.032 ± 0.019  ns/op
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Update StringLatin1.canEncode to sync with same test in CharacterData.of

Just for fun, I tried with a benchmark where the code point is Latin1 in every 
other call:


@Benchmark
public void isDigitVarying(Blackhole blackhole) {
    blackhole.consume(Character.isDigit(48));
    blackhole.consume(Character.isDigit(1632));
}


With this benchmark, there is no difference between the baseline, the PR and 
using StringLatin1.canEncode:

Baseline:


Benchmark                  (codePoint)  Mode  Cnt  Score   Error  Units
Characters.isDigitVarying         1632  avgt   15  1.198 ± 0.056  ns/op


PR:


Benchmark                  (codePoint)  Mode  Cnt  Score   Error  Units
Characters.isDigitVarying         1632  avgt   15  1.195 ± 0.058  ns/op


StringLatin1.canEncode:


Benchmark                  (codePoint)  Mode  Cnt  Score   Error  Units
Characters.isDigitVarying         1632  avgt   15  1.193 ± 0.055  ns/op
``` 

At this point, I'm starting to wonder a bit if the performance benefits 
suggested by this PR might be dubious and will only surface in very narrow 
benchmarks. On the other hand, it does not seem harmful either. What do people 
think?

-------------

PR: https://git.openjdk.org/jdk/pull/13040

Reply via email to