Thank you, Joe. I got it now. Will try out and benchmark.
Naoto
On 7/21/20 10:05 AM, Joe Wang wrote:
On 7/20/2020 8:58 PM, naoto.s...@oracle.com wrote:
The short-cut worked well. There's maybe a further optimization we
could do to rid us of the performance concern (or the need to run
additional performance tests). Consider the cases where strings in
comparison don't contain any sup characters, if we make the
toLower/UpperCase() block a method and call it before and after the
surrogate-check block, the routine would be effectively the same as
prior to the introduction of the surrogate-check block, and regular
comparisons would suffer the surrogate-check only once (the last
check). For strings that do contain sup characters then, the
toLower/UpperCase() method would have been called twice, but then we
don't care about the performance in that situation. You may call the
existing codePointAt method too when an extra getChar and performance
is not a concern (but that's your call.
Can you please elaborate this more? What's "the last check" here?
What I meant was that we could expand the 'short-cut' from case
sensitive to case insensitive, that is in addition to the line 337, do
that line 353 - 370 case-insensitive check as well.
I guess it can be explained better with code. I added inline comment:
for (int k1 = toffset, k2 = ooffset; k1 < tlast && k2 < olast;
k1++, k2++) {
int cp1 = (int)getChar(value, k1);
int cp2 = (int)getChar(other, k2);
// does a case-insensitive check:
if (checkEqual(cp1, cp2) == 0) {
continue;
}
// this block will be run once for strings that don't contain any
supplementary characters
// Check for supplementary characters case
cp1 = getSupplementaryCodePoint(value, cp1, k1, toffset,
tlast);
if ((cp1 & Integer.MIN_VALUE) != 0) {
k1++;
cp1 ^= Integer.MIN_VALUE;
}
cp2 = getSupplementaryCodePoint(other, cp2, k2, ooffset,
olast);
if ((cp2 & Integer.MIN_VALUE) != 0) {
k2++;
cp2 ^= Integer.MIN_VALUE;
}
// thischeck will have been called twice for strings that contain
supplementary characters
// but only one more for strings that don't
int diff = checkEqual(cp1, cp2);
if (diff != 0) {
return diff;
}
}
return tlen - olen;
}
// the code block between line 353 - 370 in webrev.04 except the last
line (return 0):
private static int checkEqual(int cp1, int cp2) {
if (cp1 != cp2) {
// try converting both characters to uppercase.
// If the results match, then the comparison scan should
// continue.
cp1 = Character.toUpperCase(cp1);
cp2 = Character.toUpperCase(cp2);
if (cp1 != cp2) {
// Unfortunately, conversion to uppercase does not work
properly
// for the Georgian alphabet, which has strange rules
about case
// conversion. So we need to make one last check before
// exiting.
cp1 = Character.toLowerCase(cp1);
cp2 = Character.toLowerCase(cp2);
if (cp1 != cp2) {
return cp1 - cp2;
}
}
}
return 0;
}
Naoto