On Thu, 29 Aug 2024 13:06:19 GMT, Thomas Stuefe <stu...@openjdk.org> wrote:

>>> > > Many of these translations seem awkward, since they convert to size_t 
>>> > > only to then convert back to int.
>>> > 
>>> > Can you be more specific here please?
>>> 
>>> Certainly. For example, this construct:
>>> 
>>> ```
>>>     size_t utf8_len = static_cast<size_t>(length);
>>>     const char* base = UNICODE::as_utf8(position, utf8_len);
>>>     Symbol* sym = SymbolTable::new_symbol(base, 
>>> checked_cast<int>(utf8_len));
>>> ```
>>> 
>>> We introduce `utf8_len` as a `size_t` synonym for len, but since it 
>>> originates from an `int`, its length must be <= INT_MIN. We also assume it 
>>> is >=0, but we don't check. We feed `utf8_len` into both `UNICODE::as_utf8` 
>>> and `SymbolTable::new_symbol`. The former takes a size_t, but since we rely 
>>> on `length` >= 0, we could just as well give it `length`. For 
>>> `SymbolTable::new_symbol`, we translate `utf8_len` back to `int`, with a 
>>> check. The check feels superfluous since `utf8_len` came from an int.
>>> 
>>> I assume this verbosity is for the benefit of the code reader, to make 
>>> intent clear. Otherwise, we could have just continued to use length, and 
>>> just cast it on the fly to unsigned or to size_t when calling 
>>> `UNICODE::as_utf8`.
>> 
>> This is exactly the case I was referring to. The declaration here is:
>> 
>>  template<typename T> static char* as_utf8(const T* base, size_t& length);
>> 
>> whether length is an IN/OUT parameter that is the int array length going in 
>> (hence >= 0 and <= INT_MAX), and the size_t utf8 sequence length coming out. 
>> The out coming utf8 length can theoretically by > INT_MAX but if that were 
>> the case in this code (which expects to be dealing with names that can be 
>> symbols hence < 64K) then that would be a programming error which the 
>> checked_cast would catch. And of course new_symbol checks for < 64K.
>
>> > > > Many of these translations seem awkward, since they convert to size_t 
>> > > > only to then convert back to int.
>> > > 
>> > > 
>> > > Can you be more specific here please?
>> > 
>> > 
>> > Certainly. For example, this construct:
>> > ```
>> >     size_t utf8_len = static_cast<size_t>(length);
>> >     const char* base = UNICODE::as_utf8(position, utf8_len);
>> >     Symbol* sym = SymbolTable::new_symbol(base, 
>> > checked_cast<int>(utf8_len));
>> > ```
>> > 
>> > 
>> >     
>> >       
>> >     
>> > 
>> >       
>> >     
>> > 
>> >     
>> >   
>> > We introduce `utf8_len` as a `size_t` synonym for len, but since it 
>> > originates from an `int`, its length must be <= INT_MIN. We also assume it 
>> > is >=0, but we don't check. We feed `utf8_len` into both 
>> > `UNICODE::as_utf8` and `SymbolTable::new_symbol`. The former takes a 
>> > size_t, but since we rely on `length` >= 0, we could just as well give it 
>> > `length`. For `SymbolTable::new_symbol`, we translate `utf8_len` back to 
>> > `int`, with a check. The check feels superfluous since `utf8_len` came 
>> > from an int.
>> > I assume this verbosity is for the benefit of the code reader, to make 
>> > intent clear. Otherwise, we could have just continued to use length, and 
>> > just cast it on the fly to unsigned or to size_t when calling 
>> > `UNICODE::as_utf8`.
>> 
>> This is exactly the case I was referring to. The declaration here is:
>> 
>> ```
>>  template<typename T> static char* as_utf8(const T* base, size_t& length);
>> ```
>> 
>> whether length is an IN/OUT parameter that is the int array length going in 
>> (hence >= 0 and <= INT_MAX), and the size_t utf8 sequence length coming out. 
>> The out coming utf8 length can theoretically by > INT_MAX but if that were 
>> the case in this code (which expects to be dealing with names that can be 
>> symbols hence < 64K) then that would be a programming error which the 
>> checked_cast would catch. And of course new_symbol checks for < 64K.
> 
> Oh, I completely missed that!
> 
> I wish we would use pointers instead of references in cases like these since 
> that would make the intent immediately clear when looking at the call site.

Thanks for the review and discussion @tstuefe .

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20560#issuecomment-2318932417

Reply via email to