On Tue, 16 Jun 2026 20:53:07 GMT, Naoto Sato <[email protected]> wrote:

>> vs:
>> 
>> 
>> for (int i = 0; i < lastIndex;) {
>>     if (Character.isHighSurrogate(charAt(i++))) {
>>         if (i >= lastIndex) break;
>>         if (Character.isLowSurrogate(charAt(i))) {
>>             n--;
>>             i++;
>>         }
>>     }
>> }
>> 
>> 
>> - No `else`.
>> - No state variables.
>> - Branch prediction for the second and third `if` statements will succeed 
>> 100% of the time for well-formed code unit sequences (normal strings).
>
> Does the suggested code have a bug? I think the code returns 2 for 
> "\ud800\udc00" The loop breaks before the last low surrogate.

Concluded that `if (i >= lastIndex) break;` is harmful.


for (int i = 0; i < lastIndex;) {
    if (Character.isHighSurrogate(charAt(i++))) {
        if (Character.isLowSurrogate(charAt(i))) {
            n--;
            i++;
        }
    }
}


should work as intended. `i` after `++`ed won't leave the range `[0, 
lastIndex]` (`[0, length)`) as long as `i < lestIndex` before `++`ed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r3427172848

Reply via email to