> This work has been split out from JDK-8328877: [JNI] The JNI Specification > needs to address the limitations of integer UTF-8 String lengths > > The modified UTF-8 format used by the VM can require up to six bytes to > represent one unicode character, but six byte characters are stored as UTF-16 > surrogate pairs. Hence the most bytes per character is 3, and so the maximum > length is 3*`Integer.MAX_VALUE`. Though with compact strings this reduces to > 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define > UTF8 lengths as `size_t` to accommodate all possible representations. > Higher-level API's can still use `int` if they know the strings (eg symbols) > are sufficiently constrained in length. See the comments in utf8.hpp that > explain Strings, compact strings and the encoding. > > As the existing JNI `GetStringUTFLength` still requires the current > truncating behaviour of ` UNICODE::utf8_length` we add back > `UNICODE::utf8_length_as_int` for it to use. > > Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` > use `length` as an IN/OUT parameter: it is the incoming (int) length of the > jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. > This makes some of the call sites a little messy with casts. > > Testing: > - tiers 1-4 > - GHA
David Holmes has updated the pull request incrementally with one additional commit since the last revision: more missing casts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20560/files - new: https://git.openjdk.org/jdk/pull/20560/files/8b651323..0c332e9d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20560&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20560&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20560.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20560/head:pull/20560 PR: https://git.openjdk.org/jdk/pull/20560