Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

naoto . sato Fri, 17 Jul 2020 16:39:59 -0700

Hi,

Based on the suggestions, I modified the fix as follows:


https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.01/

Changes from the initial revision are:

- Shared the implementation between compareToCI() and regionMatchesCI()
- Enabled immediate short cut if two code points match.

- Created a simple JMH benchmark. Here is the scores before and afterthe change:


before:
Benchmark                                Mode  Cnt   Score   Error  Units
StringCompareToIgnoreCase.lower          avgt   25  53.764 ? 2.811  ns/op
StringCompareToIgnoreCase.supLower       avgt   25  24.211 ? 1.135  ns/op
StringCompareToIgnoreCase.supUpperLower  avgt   25  30.595 ? 1.344  ns/op
StringCompareToIgnoreCase.upperLower     avgt   25  18.859 ? 1.499  ns/op

after:
Benchmark                                Mode  Cnt   Score   Error  Units
StringCompareToIgnoreCase.lower          avgt   25  58.354 ? 4.603  ns/op
StringCompareToIgnoreCase.supLower       avgt   25  57.975 ? 5.672  ns/op
StringCompareToIgnoreCase.supUpperLower  avgt   25  23.912 ? 0.965  ns/op
StringCompareToIgnoreCase.upperLower     avgt   25  17.744 ? 0.272  ns/op

Here, "sup" means all supplementary characters, BMP otherwise. "lower"means each character requires upper->lower casemap. "upperLower" meansall characters are the same, except the last character which requirescasemap.

I think the result is reasonable, considering surrogates check are nowmandatory. I have tried Roger's suggestion to use Arrays.mismatch() butit did not seem to benefit here. In fact, the performance degradedpartly because I implemented the short cut, and possibly for theoverhead of extra checks.


Naoto

On 7/15/20 9:00 AM, [email protected] wrote:

Hello,

Please review the fix to the following issues:

https://bugs.openjdk.java.net/browse/JDK-8248655
https://bugs.openjdk.java.net/browse/JDK-8248434

The proposed changeset and its CSR are located at:

https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8248664
A bug was filed against SimpleDateFormat (8248434) wherecase-insensitive date format/parse failed in some of the new locales inJDK15. The root cause was that case-insensitive String.regionMatches()method did not work with supplementary characters. The problem is thatthe method's spec does not expect case mappings of supplementarycharacters, possibly because it was overlooked in the first place, JSR204 - "Unicode Supplementary Character support". Similar behavior isobserved in other two case-insensitive methods, i.e.,compareToIgnoreCase() and equalsIgnoreCase().
The fix is straightforward to compare strings by code point basis,instead of code unit (16bit "char") basis. Technically this change willintroduce a backward incompatibility, but I believe it is anincompatibility to wrong behavior, not true to the meaning of thosemethods' expectations.
Naoto

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

Reply via email to