On Mon, 19 Aug 2024 23:19:04 GMT, David Holmes <dhol...@openjdk.org> wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification >> needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to >> represent one unicode character, but six byte characters are stored as >> UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the >> maximum length is 3*`Integer.MAX_VALUE`. Though with compact strings this >> reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should >> therefore define UTF8 lengths as `size_t` to accommodate all possible >> representations. Higher-level API's can still use `int` if they know the >> strings (eg symbols) are sufficiently constrained in length. See the >> comments in utf8.hpp that explain Strings, compact strings and the encoding. >> >> As the existing JNI `GetStringUTFLength` still requires the current >> truncating behaviour of ` UNICODE::utf8_length` we add back >> `UNICODE::utf8_length_as_int` for it to use. >> >> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& >> length)` use `length` as an IN/OUT parameter: it is the incoming (int) >> length of the jbyte/jchar array, and the outgoing (size_t) length of the >> UTF8 sequence. This makes some of the call sites a little messy with casts. >> >> Testing: >> - tiers 1-4 >> - GHA > > David Holmes has updated the pull request incrementally with one additional > commit since the last revision: > > Add missing cast for signed-to-unsigned converion. src/hotspot/share/classfile/javaClasses.cpp line 695: > 693: // `length` is used as the incoming number of characters to > 694: // convert, and then set as the number of bytes in the UTF8 sequence. > 695: size_t length = java_lang_String::length(java_string, value); Above comment looks wrong. `length` is not an in/out reference below, so why not leave it as `int` at line 695? Assigning `int` to `size_t` will introduce a -Wsign-conversion warning. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1722633109