On Tue, 21 Feb 2023 06:59:47 GMT, Eirik Bjorsnos <d...@openjdk.org> wrote:
>> This PR suggests we speed up Character.toUpperCase and Character.toLowerCase >> for latin1 code points by applying the 'oldest ASCII trick in the book'. >> >> This takes advantage of the fact that latin1 uppercase code points are >> always 0x20 lower than their lowercase (with the exception of two code >> points which uppercase out of latin1). >> >> To verify the correctness of the new implementation, the test >> `Latin1CaseConversion` is added with an exhaustive verification of >> toUpperCase/toLowerCase for all latin1 code points. >> >> The implementation needs to balance the performance of the various ranges in >> latin1. An effort has been made to favour operations on ASCII code points, >> without causing excessive regression for higher code points. >> >> Performance is benchmarked for 7 chosen sample code points, each >> representing a range or a special-case. Results in the first comment. > > Eirik Bjorsnos has updated the pull request incrementally with one additional > commit since the last revision: > > Spell fix for 'exhaustive' in comments in sun/text/resources A site note: Early and crude experiements using the Vector API indicate that the 'oldest ASCII trick in the book' vectorizes pretty well. Here's a benchmark comparing the strings "helloworld" and "HelloWorld" repeated 1024 times, followed by either 'A' or 'B' (to force a an expensive mismatch): Benchmark (size) Mode Cnt Score Error Units EqualsIgnoreCase.scalar 1024 avgt 15 6225.624 ± 89.182 ns/op EqualsIgnoreCase.vectorized 1024 avgt 15 1246.110 ± 14.767 ns/op I have the feeling that most case-insensitive comparisons are pretty short, so not sure how useful this is IRL. ------------- PR: https://git.openjdk.org/jdk/pull/12623