On Thu, 29 Aug 2024 13:06:19 GMT, Thomas Stuefe wrote:
>>> > > Many of these translations seem awkward, since they convert to size_t
>>> > > only to then convert back to int.
>>> >
>>> > Can you be more specific here please?
>>>
>>> Certainly. For example, this construct:
>>>
>>> ```
>>>
On Thu, 29 Aug 2024 12:37:47 GMT, David Holmes wrote:
> > > > Many of these translations seem awkward, since they convert to size_t
> > > > only to then convert back to int.
> > >
> > >
> > > Can you be more specific here please?
> >
> >
> > Certainly. For example, this construct:
> > ```
>
On Thu, 29 Aug 2024 12:30:26 GMT, David Holmes wrote:
>> src/hotspot/share/classfile/javaClasses.hpp line 138:
>>
>>> 136: // Legacy variants that truncate the length if needed
>>> 137: static intutf8_length_as_int(oop java_string);
>>> 138: static intutf8_length_as_int(oop java_st
On Thu, 29 Aug 2024 08:38:41 GMT, Thomas Stuefe wrote:
> > > Many of these translations seem awkward, since they convert to size_t
> > > only to then convert back to int.
> >
> > Can you be more specific here please?
>
> Certainly. For example, this construct:
>
> ```
> size_t utf8_len =
On Thu, 29 Aug 2024 08:52:03 GMT, Thomas Stuefe wrote:
>> David Holmes has updated the pull request with a new target base due to a
>> merge or a rebase. The incremental webrev excludes the unrelated changes
>> brought in by the merge/rebase. The pull request contains 13 additional
>> commits
On Thu, 29 Aug 2024 02:45:53 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
On Thu, 29 Aug 2024 07:59:47 GMT, David Holmes wrote:
> > Many of these translations seem awkward, since they convert to size_t only
> > to then convert back to int.
>
> Can you be more specific here please?
Certainly. For example, this construct:
size_t utf8_len = static_cast(length);
On Thu, 29 Aug 2024 05:53:25 GMT, Thomas Stuefe wrote:
> Many of these translations seem awkward, since they convert to size_t only to
> then convert back to int.
Can you be more specific here please? I know the most awkward case is where we
have an in/out parameter that brings in an int and s
On Thu, 29 Aug 2024 05:52:26 GMT, Thomas Stuefe wrote:
> Many of these translations seem awkward, since they convert to size_t only to
> then convert back to int.
>
> Proposal: I undestand you need to find a good point to tourniquet off the
> int->size_t conversion to minimize the translations
On Thu, 29 Aug 2024 02:45:53 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
> This work has been split out from JDK-8328877: [JNI] The JNI Specification
> needs to address the limitations of integer UTF-8 String lengths
>
> The modified UTF-8 format used by the VM can require up to six bytes to
> represent one unicode character, but six byte characters are stored as UTF
> This work has been split out from JDK-8328877: [JNI] The JNI Specification
> needs to address the limitations of integer UTF-8 String lengths
>
> The modified UTF-8 format used by the VM can require up to six bytes to
> represent one unicode character, but six byte characters are stored as UTF
On Wed, 28 Aug 2024 03:46:43 GMT, Dean Long wrote:
>> Note that I do already document the assumptions here in the general comment
>> in utf8.hpp:
>>
>> There is an additional assumption/expectation that our UTF8 API's are never
>> dealing with
>> invalid UTF8, and more generally that all UTF8
On Wed, 28 Aug 2024 01:24:43 GMT, David Holmes wrote:
>>> If you try to accommodate arbitrary future use then every method in the VM
>>> would need to enforce every single precondition and invariant it expects
>>> "just in case" and that is not practical.
>>
>> I'm basically arguing for Functi
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
On Tue, 27 Aug 2024 23:54:08 GMT, Dean Long wrote:
>> If you try to accommodate arbitrary future use then every method in the VM
>> would need to enforce every single precondition and invariant it expects
>> "just in case" and that is not practical. Code can and does take advantage
>> of the e
On Tue, 27 Aug 2024 21:21:01 GMT, David Holmes wrote:
> If you try to accommodate arbitrary future use then every method in the VM
> would need to enforce every single precondition and invariant it expects
> "just in case" and that is not practical.
I'm basically arguing for Functional Testing
On Tue, 27 Aug 2024 16:51:21 GMT, Dean Long wrote:
>> Why? I think that would have a large flow on effect. And this length does
>> fit in an int.
>
> The worse case is len == SIZE_MAX and therefore num_chars == SIZE_MAX, which
> won't fit in an int. If we say this will never happen because cur
On Tue, 27 Aug 2024 13:06:26 GMT, Thomas Stuefe wrote:
>> IIUC for compact strings, with non-latin-1 each pair of bytes would require
>> at most 3-bytes to encode so you'd need 2/3 of INT_MAX. With latin-1 it
>> would be 1/2 INT_MAX. But yes I suppose in theory you might be able to get
>> an o
On Tue, 27 Aug 2024 12:10:36 GMT, David Holmes wrote:
>> src/hotspot/share/utilities/utf8.cpp line 127:
>>
>>> 125: prev = c;
>>> 126: }
>>> 127: return checked_cast(num_chars);
>>
>> Ideally, this function would return size_t.
>
> Why? I think that would have a large flow on effect. An
On Tue, 27 Aug 2024 12:20:04 GMT, David Holmes wrote:
>> I think the Java string would only need to be INT_MAX/3 in length, if all
>> the characters require surrogate encoding.
>
> IIUC for compact strings, with non-latin-1 each pair of bytes would require
> at most 3-bytes to encode so you'd n
On Tue, 27 Aug 2024 07:51:38 GMT, Dean Long wrote:
>> I'm trying to reason if on 32-bit we could even create a large enough string
>> for this to be a problem? Once we have the giant string `as_utf8` will have
>> to allocate an array that is just as large if not larger. So for overflow to
>> b
On Tue, 27 Aug 2024 08:22:57 GMT, Dean Long wrote:
>> David Holmes has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> more missing casts
>
> src/hotspot/share/utilities/utf8.cpp line 127:
>
>> 125: prev = c;
>> 126: }
>> 127: retur
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
On Tue, 27 Aug 2024 07:23:14 GMT, David Holmes wrote:
>> src/hotspot/share/classfile/javaClasses.cpp line 633:
>>
>>> 631: }
>>> 632:
>>> 633: int java_lang_String::utf8_length_as_int(oop java_string, typeArrayOop
>>> value) {
>>
>> Why not call java_lang_String::utf8_length() here instead of
On Tue, 27 Aug 2024 07:20:27 GMT, David Holmes wrote:
>> src/hotspot/share/classfile/javaClasses.cpp line 588:
>>
>>> 586: size_t utf8_len = static_cast(length);
>>> 587: const char* base = UNICODE::as_utf8(position, utf8_len);
>>> 588: Symbol* sym = SymbolTable::new_symbol(base,
>>
On Tue, 27 Aug 2024 07:09:33 GMT, David Holmes wrote:
>> src/hotspot/share/classfile/javaClasses.cpp line 555:
>>
>>> 553: bool is_latin1 = java_lang_String::is_latin1(java_string);
>>> 554:
>>> 555: if (length == 0) return nullptr;
>>
>> Should this be checking for length <= 0? It l
On Tue, 27 Aug 2024 07:07:11 GMT, David Holmes wrote:
>> src/hotspot/share/classfile/javaClasses.cpp line 307:
>>
>>> 305: {
>>> 306: ResourceMark rm;
>>> 307: size_t utf8_len = static_cast(length);
>>
>> I think there should be an assert that length is not negative, probably at
>> t
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
On Tue, 27 Aug 2024 03:36:00 GMT, Dean Long wrote:
>> David Holmes has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> more missing casts
>
> src/hotspot/share/classfile/javaClasses.cpp line 633:
>
>> 631: }
>> 632:
>> 633: int java_lang_S
On Tue, 27 Aug 2024 03:13:59 GMT, Dean Long wrote:
>> David Holmes has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> more missing casts
>
> src/hotspot/share/classfile/javaClasses.cpp line 588:
>
>> 586: size_t utf8_len = static_cast(
On Tue, 27 Aug 2024 01:09:09 GMT, Dean Long wrote:
>> David Holmes has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> more missing casts
>
> src/hotspot/share/classfile/javaClasses.cpp line 307:
>
>> 305: {
>> 306: ResourceMark rm;
>
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
On Mon, 26 Aug 2024 23:37:15 GMT, Coleen Phillimore wrote:
>> Why "> 0" ?
>
> Because length is an in which could be negative but you're passing it to
> size_t. -Wsign-conversion might complain because you're changing signs. I
> guess you know from context that it's a positive number, so ok.
On Thu, 15 Aug 2024 20:44:52 GMT, Coleen Phillimore wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represen
On Mon, 19 Aug 2024 23:08:35 GMT, David Holmes wrote:
>> src/hotspot/share/classfile/javaClasses.cpp line 639:
>>
>>> 637: if (length == 0) {
>>> 638: return 0;
>>> 639: }
>>
>> Maybe assert length > 0 here?
>
> Why "> 0" ?
Because length is an in which could be negative but you're pas
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
On Thu, 15 Aug 2024 20:44:52 GMT, Coleen Phillimore wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represen
> This work has been split out from JDK-8328877: [JNI] The JNI Specification
> needs to address the limitations of integer UTF-8 String lengths
>
> The modified UTF-8 format used by the VM can require up to six bytes to
> represent one unicode character, but six byte characters are stored as UTF
> This work has been split out from JDK-8328877: [JNI] The JNI Specification
> needs to address the limitations of integer UTF-8 String lengths
>
> The modified UTF-8 format used by the VM can require up to six bytes to
> represent one unicode character, but six byte characters are stored as UTF
> This work has been split out from JDK-8328877: [JNI] The JNI Specification
> needs to address the limitations of integer UTF-8 String lengths
>
> The modified UTF-8 format used by the VM can require up to six bytes to
> represent one unicode character, but six byte characters are stored as UTF
On Tue, 20 Aug 2024 03:16:29 GMT, Dean Long wrote:
>> David Holmes has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Add missing cast for signed-to-unsigned converion.
>
> src/hotspot/share/classfile/javaClasses.cpp line 695:
>
>> 693:
On Mon, 19 Aug 2024 23:19:04 GMT, David Holmes wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represent one
On Thu, 15 Aug 2024 20:43:51 GMT, Coleen Phillimore wrote:
> Did you try to compile this with -Wconversion (not -Wsign-conversion) without
> -Werror?
I hadn't but will do so ...
> src/hotspot/share/classfile/javaClasses.cpp line 586:
>
>> 584: ResourceMark rm;
>> 585: jbyte* position
> This work has been split out from JDK-8328877: [JNI] The JNI Specification
> needs to address the limitations of integer UTF-8 String lengths
>
> The modified UTF-8 format used by the VM can require up to six bytes to
> represent one unicode character, but six byte characters are stored as UTF
On Thu, 15 Aug 2024 20:44:52 GMT, Coleen Phillimore wrote:
>> This work has been split out from JDK-8328877: [JNI] The JNI Specification
>> needs to address the limitations of integer UTF-8 String lengths
>>
>> The modified UTF-8 format used by the VM can require up to six bytes to
>> represen
On Tue, 13 Aug 2024 02:20:41 GMT, David Holmes wrote:
> This work has been split out from JDK-8328877: [JNI] The JNI Specification
> needs to address the limitations of integer UTF-8 String lengths
>
> The modified UTF-8 format used by the VM can require up to six bytes to
> represent one unic
This work has been split out from JDK-8328877: [JNI] The JNI Specification
needs to address the limitations of integer UTF-8 String lengths
The modified UTF-8 format used by the VM can require up to six bytes to
represent one unicode character, but six byte characters are stored as UTF-16
surro
52 matches
Mail list logo