On Tue, 13 Aug 2024 02:20:41 GMT, David Holmes <dhol...@openjdk.org> wrote:
> This work has been split out from JDK-8328877: [JNI] The JNI Specification > needs to address the limitations of integer UTF-8 String lengths > > The modified UTF-8 format used by the VM can require up to six bytes to > represent one unicode character, but six byte characters are stored as UTF-16 > surrogate pairs. Hence the most bytes per character is 3, and so the maximum > length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to > 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define > UTF8 lengths as `size_t` to accommodate all possible representations. > Higher-level API's can still use `int` if they know the strings (eg symbols) > are sufficiently constrained in length. See the comments in utf8.hpp that > explain Strings, compact strings and the encoding. > > As the existing JNI `GetStringUTFLength` still requires the current > truncating behaviour of ` UNICODE::utf8_length` we add back > `UNICODE::utf8_length_as_int` for it to use. > > Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` > use `length` as an IN/OUT parameter: it is the incoming (int) length of the > jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. > This makes some of the call sites a little messy with casts. > > Testing: > - tiers 1-4 > - GHA One thing that I think needs changing and some comments, but your change strikes a good balance of promoting enough parameters to size_t but not too many. Did you try to compile this with -Wconversion (not -Wsign-conversion) without -Werror? It doesn't look like GHA is configured for you here. src/hotspot/share/classfile/javaClasses.cpp line 586: > 584: ResourceMark rm; > 585: jbyte* position = (length == 0) ? nullptr : value->byte_at_addr(0); > 586: size_t utf8_len = length; These are sign conversions. This is too big to change here and not sure what the fan out would be but java_lang_String::length() should return unsigned and also arrayOop.hpp length. They're never negative. We should probably assert that (below). src/hotspot/share/classfile/javaClasses.cpp line 639: > 637: if (length == 0) { > 638: return 0; > 639: } Maybe assert length > 0 here? src/hotspot/share/classfile/javaClasses.cpp line 702: > 700: } else { > 701: jbyte* position = (length == 0) ? nullptr : value->byte_at_addr(0); > 702: return UNICODE::as_utf8(position, static_cast<int>(length), buf, > buflen); Don't you want checked_cast here not static casts. src/hotspot/share/prims/jni.cpp line 2226: > 2224: HOTSPOT_JNI_GETSTRINGUTFLENGTH_ENTRY(env, string); > 2225: oop java_string = JNIHandles::resolve_non_null(string); > 2226: jsize ret = java_lang_String::utf8_length_as_int(java_string); So the spec says that this should be jsize (signed int), which is why this is, right? src/hotspot/share/utilities/utf8.cpp line 512: > 510: > 511: template<typename T> > 512: int UNICODE::utf8_length_as_int(const T* base, int length) { Why was this parameter left as int and not size_t ? src/hotspot/share/utilities/utf8.hpp line 47: > 45: > 46: In the code below if we have latin-1 content then we treat the String's > data > 47: array as a jbyte[], else a jchar[]. The lengths of these arrays are > specified jchar is 16-bits right? So the max length if not latin-1 is INT_MAX/2 ? src/hotspot/share/utilities/utf8.hpp line 70: > 68: > 69: */ > 70: This is a good place for this comment. One could find it again if they need to understand this. ------------- PR Review: https://git.openjdk.org/jdk/pull/20560#pullrequestreview-2241255691 PR Comment: https://git.openjdk.org/jdk/pull/20560#issuecomment-2292205627 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1718945451 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1718948865 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1718950816 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1718954620 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1718959373 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1718964159 PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1718966422