Hi,
Based on the suggestions, I modified the fix as follows:
https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.01/
Changes from the initial revision are:
- Shared the implementation between compareToCI() and regionMatchesCI()
- Enabled immediate short cut if two code points match.
- Created a simple JMH benchmark. Here is the scores before and after
the change:
before:
Benchmark Mode Cnt Score Error Units
StringCompareToIgnoreCase.lower avgt 25 53.764 ? 2.811 ns/op
StringCompareToIgnoreCase.supLower avgt 25 24.211 ? 1.135 ns/op
StringCompareToIgnoreCase.supUpperLower avgt 25 30.595 ? 1.344 ns/op
StringCompareToIgnoreCase.upperLower avgt 25 18.859 ? 1.499 ns/op
after:
Benchmark Mode Cnt Score Error Units
StringCompareToIgnoreCase.lower avgt 25 58.354 ? 4.603 ns/op
StringCompareToIgnoreCase.supLower avgt 25 57.975 ? 5.672 ns/op
StringCompareToIgnoreCase.supUpperLower avgt 25 23.912 ? 0.965 ns/op
StringCompareToIgnoreCase.upperLower avgt 25 17.744 ? 0.272 ns/op
Here, "sup" means all supplementary characters, BMP otherwise. "lower"
means each character requires upper->lower casemap. "upperLower" means
all characters are the same, except the last character which requires
casemap.
I think the result is reasonable, considering surrogates check are now
mandatory. I have tried Roger's suggestion to use Arrays.mismatch() but
it did not seem to benefit here. In fact, the performance degraded
partly because I implemented the short cut, and possibly for the
overhead of extra checks.
Naoto
On 7/15/20 9:00 AM, naoto.s...@oracle.com wrote:
Hello,
Please review the fix to the following issues:
https://bugs.openjdk.java.net/browse/JDK-8248655
https://bugs.openjdk.java.net/browse/JDK-8248434
The proposed changeset and its CSR are located at:
https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8248664
A bug was filed against SimpleDateFormat (8248434) where
case-insensitive date format/parse failed in some of the new locales in
JDK15. The root cause was that case-insensitive String.regionMatches()
method did not work with supplementary characters. The problem is that
the method's spec does not expect case mappings of supplementary
characters, possibly because it was overlooked in the first place, JSR
204 - "Unicode Supplementary Character support". Similar behavior is
observed in other two case-insensitive methods, i.e.,
compareToIgnoreCase() and equalsIgnoreCase().
The fix is straightforward to compare strings by code point basis,
instead of code unit (16bit "char") basis. Technically this change will
introduce a backward incompatibility, but I believe it is an
incompatibility to wrong behavior, not true to the meaning of those
methods' expectations.
Naoto