subject:"RFR\: 8338257\: UTF8 lengths should be size

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread David Holmes

On Wed, 28 Aug 2024 03:46:43 GMT, Dean Long wrote: >> Note that I do already document the assumptions here in the general comment >> in utf8.hpp: >> >> There is an additional assumption/expectation that our UTF8 API's are never >> dealing with >> invalid UTF8, and more generally that all UTF8

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread Dean Long

On Wed, 28 Aug 2024 01:24:43 GMT, David Holmes wrote: >>> If you try to accommodate arbitrary future use then every method in the VM >>> would need to enforce every single precondition and invariant it expects >>> "just in case" and that is not practical. >> >> I'm basically arguing for Functi

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread Dean Long

On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification >> needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to >> represent one

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread David Holmes

On Tue, 27 Aug 2024 23:54:08 GMT, Dean Long wrote: >> If you try to accommodate arbitrary future use then every method in the VM >> would need to enforce every single precondition and invariant it expects >> "just in case" and that is not practical. Code can and does take advantage >> of the e

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread Dean Long

On Tue, 27 Aug 2024 21:21:01 GMT, David Holmes wrote: > If you try to accommodate arbitrary future use then every method in the VM > would need to enforce every single precondition and invariant it expects > "just in case" and that is not practical. I'm basically arguing for Functional Testing

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread David Holmes

On Tue, 27 Aug 2024 16:51:21 GMT, Dean Long wrote: >> Why? I think that would have a large flow on effect. And this length does >> fit in an int. > > The worse case is len == SIZE_MAX and therefore num_chars == SIZE_MAX, which > won't fit in an int. If we say this will never happen because cur

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread David Holmes

On Tue, 27 Aug 2024 13:06:26 GMT, Thomas Stuefe wrote: >> IIUC for compact strings, with non-latin-1 each pair of bytes would require >> at most 3-bytes to encode so you'd need 2/3 of INT_MAX. With latin-1 it >> would be 1/2 INT_MAX. But yes I suppose in theory you might be able to get >> an o

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread Dean Long

On Tue, 27 Aug 2024 12:10:36 GMT, David Holmes wrote: >> src/hotspot/share/utilities/utf8.cpp line 127: >> >>> 125: prev = c; >>> 126: } >>> 127: return checked_cast(num_chars); >> >> Ideally, this function would return size_t. > > Why? I think that would have a large flow on effect. An

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread Thomas Stuefe

On Tue, 27 Aug 2024 12:20:04 GMT, David Holmes wrote: >> I think the Java string would only need to be INT_MAX/3 in length, if all >> the characters require surrogate encoding. > > IIUC for compact strings, with non-latin-1 each pair of bytes would require > at most 3-bytes to encode so you'd n

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread David Holmes

On Tue, 27 Aug 2024 07:51:38 GMT, Dean Long wrote: >> I'm trying to reason if on 32-bit we could even create a large enough string >> for this to be a problem? Once we have the giant string `as_utf8` will have >> to allocate an array that is just as large if not larger. So for overflow to >> b

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread David Holmes

On Tue, 27 Aug 2024 08:22:57 GMT, Dean Long wrote: >> David Holmes has updated the pull request incrementally with one additional >> commit since the last revision: >> >> more missing casts > > src/hotspot/share/utilities/utf8.cpp line 127: > >> 125: prev = c; >> 126: } >> 127: retur

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread Dean Long

On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification >> needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to >> represent one

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread Dean Long

On Tue, 27 Aug 2024 07:23:14 GMT, David Holmes wrote: >> src/hotspot/share/classfile/javaClasses.cpp line 633: >> >>> 631: } >>> 632: >>> 633: int java_lang_String::utf8_length_as_int(oop java_string, typeArrayOop >>> value) { >> >> Why not call java_lang_String::utf8_length() here instead of

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread Dean Long

On Tue, 27 Aug 2024 07:20:27 GMT, David Holmes wrote: >> src/hotspot/share/classfile/javaClasses.cpp line 588: >> >>> 586: size_t utf8_len = static_cast(length); >>> 587: const char* base = UNICODE::as_utf8(position, utf8_len); >>> 588: Symbol* sym = SymbolTable::new_symbol(base, >>

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread Dean Long

On Tue, 27 Aug 2024 07:09:33 GMT, David Holmes wrote: >> src/hotspot/share/classfile/javaClasses.cpp line 555: >> >>> 553: bool is_latin1 = java_lang_String::is_latin1(java_string); >>> 554: >>> 555: if (length == 0) return nullptr; >> >> Should this be checking for length <= 0? It l

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread Dean Long

On Tue, 27 Aug 2024 07:07:11 GMT, David Holmes wrote: >> src/hotspot/share/classfile/javaClasses.cpp line 307: >> >>> 305: { >>> 306: ResourceMark rm; >>> 307: size_t utf8_len = static_cast(length); >> >> I think there should be an assert that length is not negative, probably at >> t

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread David Holmes

On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification >> needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to >> represent one

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread David Holmes

On Tue, 27 Aug 2024 03:36:00 GMT, Dean Long wrote: >> David Holmes has updated the pull request incrementally with one additional >> commit since the last revision: >> >> more missing casts > > src/hotspot/share/classfile/javaClasses.cpp line 633: > >> 631: } >> 632: >> 633: int java_lang_S

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread David Holmes

On Tue, 27 Aug 2024 03:13:59 GMT, Dean Long wrote: >> David Holmes has updated the pull request incrementally with one additional >> commit since the last revision: >> >> more missing casts > > src/hotspot/share/classfile/javaClasses.cpp line 588: > >> 586: size_t utf8_len = static_cast(

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread David Holmes

On Tue, 27 Aug 2024 01:09:09 GMT, Dean Long wrote: >> David Holmes has updated the pull request incrementally with one additional >> commit since the last revision: >> >> more missing casts > > src/hotspot/share/classfile/javaClasses.cpp line 307: > >> 305: { >> 306: ResourceMark rm; >

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-26 Thread Dean Long

On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification >> needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to >> represent one

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-26 Thread Dean Long

On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification >> needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to >> represent one

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-26 Thread Dean Long

On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification >> needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to >> represent one

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-26 Thread Dean Long

On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification >> needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to >> represent one

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-26 Thread Dean Long

On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification >> needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to >> represent one

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-26 Thread David Holmes

On Mon, 26 Aug 2024 23:37:15 GMT, Coleen Phillimore wrote: >> Why "> 0" ? > > Because length is an in which could be negative but you're passing it to > size_t. -Wsign-conversion might complain because you're changing signs. I > guess you know from context that it's a positive number, so ok.

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-26 Thread Coleen Phillimore

On Mon, 19 Aug 2024 23:08:35 GMT, David Holmes wrote: >> src/hotspot/share/classfile/javaClasses.cpp line 639: >> >>> 637: if (length == 0) { >>> 638: return 0; >>> 639: } >> >> Maybe assert length > 0 here? > > Why "> 0" ? Because length is an in which could be negative but you're pas

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-26 Thread Coleen Phillimore

On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes wrote: >> This work has been split out from JDK-8328877: [JNI] The JNI Specification >> needs to address the limitations of integer UTF-8 String lengths >> >> The modified UTF-8 format used by the VM can require up to six bytes to >> represent one

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-19 Thread David Holmes

> This work has been split out from JDK-8328877: [JNI] The JNI Specification > needs to address the limitations of integer UTF-8 String lengths > > The modified UTF-8 format used by the VM can require up to six bytes to > represent one unicode character, but six byte characters are stored as UTF

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

29 matches

Site Navigation

Mail list logo

Footer information