On Thu, 29 Aug 2024 12:37:47 GMT, David Holmes <dhol...@openjdk.org> wrote:

> > > > Many of these translations seem awkward, since they convert to size_t 
> > > > only to then convert back to int.
> > > 
> > > 
> > > Can you be more specific here please?
> > 
> > 
> > Certainly. For example, this construct:
> > ```
> >     size_t utf8_len = static_cast<size_t>(length);
> >     const char* base = UNICODE::as_utf8(position, utf8_len);
> >     Symbol* sym = SymbolTable::new_symbol(base, 
> > checked_cast<int>(utf8_len));
> > ```
> > 
> > 
> >     
> >       
> >     
> > 
> >       
> >     
> > 
> >     
> >   
> > We introduce `utf8_len` as a `size_t` synonym for len, but since it 
> > originates from an `int`, its length must be <= INT_MIN. We also assume it 
> > is >=0, but we don't check. We feed `utf8_len` into both `UNICODE::as_utf8` 
> > and `SymbolTable::new_symbol`. The former takes a size_t, but since we rely 
> > on `length` >= 0, we could just as well give it `length`. For 
> > `SymbolTable::new_symbol`, we translate `utf8_len` back to `int`, with a 
> > check. The check feels superfluous since `utf8_len` came from an int.
> > I assume this verbosity is for the benefit of the code reader, to make 
> > intent clear. Otherwise, we could have just continued to use length, and 
> > just cast it on the fly to unsigned or to size_t when calling 
> > `UNICODE::as_utf8`.
> 
> This is exactly the case I was referring to. The declaration here is:
> 
> ```
>  template<typename T> static char* as_utf8(const T* base, size_t& length);
> ```
> 
> whether length is an IN/OUT parameter that is the int array length going in 
> (hence >= 0 and <= INT_MAX), and the size_t utf8 sequence length coming out. 
> The out coming utf8 length can theoretically by > INT_MAX but if that were 
> the case in this code (which expects to be dealing with names that can be 
> symbols hence < 64K) then that would be a programming error which the 
> checked_cast would catch. And of course new_symbol checks for < 64K.

Oh, I completely missed that!

I wish we would use pointers instead of references in cases like these since 
that would make the intent immediately clear when looking at the call site.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20560#issuecomment-2317607555

Reply via email to