One option is to have a fast path that uses char functions, up to the point where you hit a high surrogate, then drop into the slower codepoint path. That saves time for the high-runner cases.
On the other hand, if the times are good enough, you might not need the complication. Mark On Fri, Jul 17, 2020 at 4:39 PM <naoto.s...@oracle.com> wrote: > Hi, > > Based on the suggestions, I modified the fix as follows: > > https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.01/ > > Changes from the initial revision are: > > - Shared the implementation between compareToCI() and regionMatchesCI() > - Enabled immediate short cut if two code points match. > - Created a simple JMH benchmark. Here is the scores before and after > the change: > > before: > Benchmark Mode Cnt Score Error Units > StringCompareToIgnoreCase.lower avgt 25 53.764 ? 2.811 ns/op > StringCompareToIgnoreCase.supLower avgt 25 24.211 ? 1.135 ns/op > StringCompareToIgnoreCase.supUpperLower avgt 25 30.595 ? 1.344 ns/op > StringCompareToIgnoreCase.upperLower avgt 25 18.859 ? 1.499 ns/op > > after: > Benchmark Mode Cnt Score Error Units > StringCompareToIgnoreCase.lower avgt 25 58.354 ? 4.603 ns/op > StringCompareToIgnoreCase.supLower avgt 25 57.975 ? 5.672 ns/op > StringCompareToIgnoreCase.supUpperLower avgt 25 23.912 ? 0.965 ns/op > StringCompareToIgnoreCase.upperLower avgt 25 17.744 ? 0.272 ns/op > > Here, "sup" means all supplementary characters, BMP otherwise. "lower" > means each character requires upper->lower casemap. "upperLower" means > all characters are the same, except the last character which requires > casemap. > > I think the result is reasonable, considering surrogates check are now > mandatory. I have tried Roger's suggestion to use Arrays.mismatch() but > it did not seem to benefit here. In fact, the performance degraded > partly because I implemented the short cut, and possibly for the > overhead of extra checks. > > Naoto > > On 7/15/20 9:00 AM, naoto.s...@oracle.com wrote: > > Hello, > > > > Please review the fix to the following issues: > > > > https://bugs.openjdk.java.net/browse/JDK-8248655 > > https://bugs.openjdk.java.net/browse/JDK-8248434 > > > > The proposed changeset and its CSR are located at: > > > > https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.00/ > > https://bugs.openjdk.java.net/browse/JDK-8248664 > > > > A bug was filed against SimpleDateFormat (8248434) where > > case-insensitive date format/parse failed in some of the new locales in > > JDK15. The root cause was that case-insensitive String.regionMatches() > > method did not work with supplementary characters. The problem is that > > the method's spec does not expect case mappings of supplementary > > characters, possibly because it was overlooked in the first place, JSR > > 204 - "Unicode Supplementary Character support". Similar behavior is > > observed in other two case-insensitive methods, i.e., > > compareToIgnoreCase() and equalsIgnoreCase(). > > > > The fix is straightforward to compare strings by code point basis, > > instead of code unit (16bit "char") basis. Technically this change will > > introduce a backward incompatibility, but I believe it is an > > incompatibility to wrong behavior, not true to the meaning of those > > methods' expectations. > > > > Naoto >