On Thu, 29 Aug 2024 12:37:47 GMT, David Holmes <dhol...@openjdk.org> wrote:
> > > > Many of these translations seem awkward, since they convert to size_t > > > > only to then convert back to int. > > > > > > > > > Can you be more specific here please? > > > > > > Certainly. For example, this construct: > > ``` > > size_t utf8_len = static_cast<size_t>(length); > > const char* base = UNICODE::as_utf8(position, utf8_len); > > Symbol* sym = SymbolTable::new_symbol(base, > > checked_cast<int>(utf8_len)); > > ``` > > > > > > > > > > > > > > > > > > > > > > > > We introduce `utf8_len` as a `size_t` synonym for len, but since it > > originates from an `int`, its length must be <= INT_MIN. We also assume it > > is >=0, but we don't check. We feed `utf8_len` into both `UNICODE::as_utf8` > > and `SymbolTable::new_symbol`. The former takes a size_t, but since we rely > > on `length` >= 0, we could just as well give it `length`. For > > `SymbolTable::new_symbol`, we translate `utf8_len` back to `int`, with a > > check. The check feels superfluous since `utf8_len` came from an int. > > I assume this verbosity is for the benefit of the code reader, to make > > intent clear. Otherwise, we could have just continued to use length, and > > just cast it on the fly to unsigned or to size_t when calling > > `UNICODE::as_utf8`. > > This is exactly the case I was referring to. The declaration here is: > > ``` > template<typename T> static char* as_utf8(const T* base, size_t& length); > ``` > > whether length is an IN/OUT parameter that is the int array length going in > (hence >= 0 and <= INT_MAX), and the size_t utf8 sequence length coming out. > The out coming utf8 length can theoretically by > INT_MAX but if that were > the case in this code (which expects to be dealing with names that can be > symbols hence < 64K) then that would be a programming error which the > checked_cast would catch. And of course new_symbol checks for < 64K. Oh, I completely missed that! I wish we would use pointers instead of references in cases like these since that would make the intent immediately clear when looking at the call site. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20560#issuecomment-2317607555