On Fri, 19 Sep 2025 01:15:14 GMT, Guanqiang Han <g...@openjdk.org> wrote:

>> test/jdk/jdk/internal/util/TestUtfLen.java line 50:
>> 
>>> 48:         for (int i = 0; i < iterations; i++) {
>>> 49:             total += ModifiedUtf.utfLen(chunk, 0);
>>> 50:         }
>> 
>> Suggestion:
>> 
>>         long total = ModifiedUtf.utfLen(chunk.repeat(iterations), 0);
>
> String.repeat() cannot generate a string whose total length exceeds 
> Integer.MAX_VALUE due to internal limits. That’s why I used a small chunk and 
> accumulated UTF-8 length in a loop.It seems that the String type cannot hold 
> a string whose length exceeds Integer.MAX_VALUE.
> https://github.com/openjdk/jdk/blob/e3a4c28409ac62feee9efe069e3a3482e7e2cdd2/src/java.base/share/classes/java/lang/String.java#L4875

jshell --add-exports java.base/jdk.internal.util=ALL-UNNAMED
|  Welcome to JShell -- Version 24
|  For an introduction type: /help intro

jshell> import jdk.internal.util.ModifiedUtf;

jshell> var s = "\u0100\u0100\u2600".repeat(Integer.MAX_VALUE / 6 - 1);
s ==> "???????????????????????????????????????????????? ... 
?????????????????????????"

jshell> ModifiedUtf.utfLen(s)
|  Error:
|  method utfLen in class jdk.internal.util.ModifiedUtf cannot be applied to 
given types;
|    required: java.lang.String,int
|    found:    java.lang.String
|    reason: actual and formal argument lists differ in length
|  ModifiedUtf.utfLen(s)
|  ^----------------^

jshell> ModifiedUtf.utfLen(s, 0)
$3 ==> -1789569716


You can construct such a string if the number of bytes in the Modified UTF 8 
form is more than the number of bytes in UTF16 form, such as if you use all 
3-byte characters.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27285#discussion_r2361493886

Reply via email to