Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Dean Long Tue, 27 Aug 2024 00:54:25 -0700

On Tue, 27 Aug 2024 07:20:27 GMT, David Holmes <dhol...@openjdk.org> wrote:


>> src/hotspot/share/classfile/javaClasses.cpp line 588:
>> 
>>> 586:     size_t utf8_len = static_cast<size_t>(length);
>>> 587:     const char* base = UNICODE::as_utf8(position, utf8_len);
>>> 588:     Symbol* sym = SymbolTable::new_symbol(base, 
>>> checked_cast<int>(utf8_len));
>> 
>> With the current limitations of checked_cast(), we would also need to check 
>> if the result is negative on 32-bit platforms, because then size_t and int 
>> will be the same size, and checked_cast will never complain.
>
> I'm trying to reason if on 32-bit we could even create a large enough string 
> for this to be a problem? Once we have the giant string `as_utf8` will have 
> to allocate an array that is just as large if not larger. So for overflow to 
> be an issue we need a string of length INT_MAX - which is limited to 2GB and 
> then we have to allocate a resource array of 2GB as well. So we need to have 
> allocated 4GB which is our entire address space on 32-bit. So I don't think 
> we can ever hit a problem on 32-bit where the size_t utf8 length would 
> convert to a negative int.

I think the Java string would only need to be INT_MAX/3 in length, if all the 
characters require surrogate encoding.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732326074

Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

Reply via email to