On Tue, 21 Feb 2023 06:59:47 GMT, Eirik Bjorsnos <d...@openjdk.org> wrote:

>> This PR suggests we speed up Character.toUpperCase and Character.toLowerCase 
>> for latin1 code points by applying the 'oldest ASCII trick in the book'.
>> 
>> This takes advantage of the fact that latin1 uppercase code points are 
>> always 0x20 lower than their lowercase (with the exception of two code 
>> points which uppercase out of latin1).
>> 
>> To verify the correctness of the new implementation, the test 
>> `Latin1CaseConversion` is added with an exhaustive verification of 
>> toUpperCase/toLowerCase for all latin1 code points.
>> 
>> The implementation needs to balance the performance of the various ranges in 
>> latin1. An effort has been made to favour operations on ASCII code points, 
>> without causing excessive regression for higher code points.
>> 
>> Performance is benchmarked for 7 chosen sample code points, each 
>> representing a range or a special-case.  Results in the first comment.
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Spell fix for 'exhaustive' in comments in sun/text/resources

A site note: Early and crude experiements using the Vector API indicate that 
the 'oldest ASCII trick in the book' vectorizes pretty well.

Here's a benchmark comparing the strings "helloworld" and "HelloWorld" repeated 
1024 times, followed by either 'A' or 'B' (to force a an expensive mismatch):


Benchmark                    (size)  Mode  Cnt     Score    Error  Units
EqualsIgnoreCase.scalar        1024  avgt   15  6225.624 ± 89.182  ns/op
EqualsIgnoreCase.vectorized    1024  avgt   15  1246.110 ± 14.767  ns/op


I have the feeling that most case-insensitive comparisons are pretty short, so 
not sure how useful this is IRL.

-------------

PR: https://git.openjdk.org/jdk/pull/12623

Reply via email to