On Wed, 28 Aug 2024 03:46:43 GMT, Dean Long wrote:
>> Note that I do already document the assumptions here in the general comment
>> in utf8.hpp:
>>
>> There is an additional assumption/expectation that our UTF8 API's are never
>> dealing with
>> invalid UTF8, and more generally that all UTF8
On Wed, 28 Aug 2024 01:24:43 GMT, David Holmes wrote:
>>> If you try to accommodate arbitrary future use then every method in the VM
>>> would need to enforce every single precondition and invariant it expects
>>> "just in case" and that is not practical.
>>
>> I'm basically arguing for Functi
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
On Tue, 27 Aug 2024 23:54:08 GMT, Dean Long wrote:
>> If you try to accommodate arbitrary future use then every method in the VM
>> would need to enforce every single precondition and invariant it expects
>> "just in case" and that is not practical. Code can and does take advantage
>> of the e
On Tue, 27 Aug 2024 21:21:01 GMT, David Holmes wrote:
> If you try to accommodate arbitrary future use then every method in the VM
> would need to enforce every single precondition and invariant it expects
> "just in case" and that is not practical.
I'm basically arguing for Functional Testing
On Tue, 27 Aug 2024 16:51:21 GMT, Dean Long wrote:
>> Why? I think that would have a large flow on effect. And this length does
>> fit in an int.
>
> The worse case is len == SIZE_MAX and therefore num_chars == SIZE_MAX, which
> won't fit in an int. If we say this will never happen because cur
On Tue, 27 Aug 2024 13:06:26 GMT, Thomas Stuefe wrote:
>> IIUC for compact strings, with non-latin-1 each pair of bytes would require
>> at most 3-bytes to encode so you'd need 2/3 of INT_MAX. With latin-1 it
>> would be 1/2 INT_MAX. But yes I suppose in theory you might be able to get
>> an o
On Tue, 27 Aug 2024 12:10:36 GMT, David Holmes wrote:
>> src/hotspot/share/utilities/utf8.cpp line 127:
>>
>>> 125: prev = c;
>>> 126: }
>>> 127: return checked_cast(num_chars);
>>
>> Ideally, this function would return size_t.
>
> Why? I think that would have a large flow on effect. An
On Tue, 27 Aug 2024 12:20:04 GMT, David Holmes wrote:
>> I think the Java string would only need to be INT_MAX/3 in length, if all
>> the characters require surrogate encoding.
>
> IIUC for compact strings, with non-latin-1 each pair of bytes would require
> at most 3-bytes to encode so you'd n
On Tue, 27 Aug 2024 07:51:38 GMT, Dean Long wrote:
>> I'm trying to reason if on 32-bit we could even create a large enough string
>> for this to be a problem? Once we have the giant string `as_utf8` will have
>> to allocate an array that is just as large if not larger. So for overflow to
>> b
On Tue, 27 Aug 2024 08:22:57 GMT, Dean Long wrote:
>> David Holmes has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> more missing casts
>
> src/hotspot/share/utilities/utf8.cpp line 127:
>
>> 125: prev = c;
>> 126: }
>> 127: retur
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
On Tue, 27 Aug 2024 07:23:14 GMT, David Holmes wrote:
>> src/hotspot/share/classfile/javaClasses.cpp line 633:
>>
>>> 631: }
>>> 632:
>>> 633: int java_lang_String::utf8_length_as_int(oop java_string, typeArrayOop
>>> value) {
>>
>> Why not call java_lang_String::utf8_length() here instead of
On Tue, 27 Aug 2024 07:20:27 GMT, David Holmes wrote:
>> src/hotspot/share/classfile/javaClasses.cpp line 588:
>>
>>> 586: size_t utf8_len = static_cast(length);
>>> 587: const char* base = UNICODE::as_utf8(position, utf8_len);
>>> 588: Symbol* sym = SymbolTable::new_symbol(base,
>>
On Tue, 27 Aug 2024 07:09:33 GMT, David Holmes wrote:
>> src/hotspot/share/classfile/javaClasses.cpp line 555:
>>
>>> 553: bool is_latin1 = java_lang_String::is_latin1(java_string);
>>> 554:
>>> 555: if (length == 0) return nullptr;
>>
>> Should this be checking for length <= 0? It l
On Tue, 27 Aug 2024 07:07:11 GMT, David Holmes wrote:
>> src/hotspot/share/classfile/javaClasses.cpp line 307:
>>
>>> 305: {
>>> 306: ResourceMark rm;
>>> 307: size_t utf8_len = static_cast(length);
>>
>> I think there should be an assert that length is not negative, probably at
>> t
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
On Tue, 27 Aug 2024 03:36:00 GMT, Dean Long wrote:
>> David Holmes has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> more missing casts
>
> src/hotspot/share/classfile/javaClasses.cpp line 633:
>
>> 631: }
>> 632:
>> 633: int java_lang_S
On Tue, 27 Aug 2024 03:13:59 GMT, Dean Long wrote:
>> David Holmes has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> more missing casts
>
> src/hotspot/share/classfile/javaClasses.cpp line 588:
>
>> 586: size_t utf8_len = static_cast(
On Tue, 27 Aug 2024 01:09:09 GMT, Dean Long wrote:
>> David Holmes has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> more missing casts
>
> src/hotspot/share/classfile/javaClasses.cpp line 307:
>
>> 305: {
>> 306: ResourceMark rm;
>
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
On Mon, 26 Aug 2024 23:37:15 GMT, Coleen Phillimore wrote:
>> Why "> 0" ?
>
> Because length is an in which could be negative but you're passing it to
> size_t. -Wsign-conversion might complain because you're changing signs. I
> guess you know from context that it's a positive number, so ok.
On Mon, 19 Aug 2024 23:08:35 GMT, David Holmes wrote:
>> src/hotspot/share/classfile/javaClasses.cpp line 639:
>>
>>> 637: if (length == 0) {
>>> 638: return 0;
>>> 639: }
>>
>> Maybe assert length > 0 here?
>
> Why "> 0" ?
Because length is an in which could be negative but you're pas
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
> This work has been split out from JDK-8328877: [JNI] The JNI Specification
> needs to address the limitations of integer UTF-8 String lengths
>
> The modified UTF-8 format used by the VM can require up to six bytes to
> represent one unicode character, but six byte characters are stored as UTF
29 matches
Mail list logo