On Thu, 29 Aug 2024 08:38:41 GMT, Thomas Stuefe <stu...@openjdk.org> wrote:
> > > Many of these translations seem awkward, since they convert to size_t > > > only to then convert back to int. > > > > Can you be more specific here please? > > Certainly. For example, this construct: > > ``` > size_t utf8_len = static_cast<size_t>(length); > const char* base = UNICODE::as_utf8(position, utf8_len); > Symbol* sym = SymbolTable::new_symbol(base, checked_cast<int>(utf8_len)); > ``` > > We introduce `utf8_len` as a `size_t` synonym for len, but since it > originates from an `int`, its length must be <= INT_MIN. We also assume it is > >=0, but we don't check. We feed `utf8_len` into both `UNICODE::as_utf8` and > `SymbolTable::new_symbol`. The former takes a size_t, but since we rely on > `length` >= 0, we could just as well give it `length`. For > `SymbolTable::new_symbol`, we translate `utf8_len` back to `int`, with a > check. The check feels superfluous since `utf8_len` came from an int. > > I assume this verbosity is for the benefit of the code reader, to make intent > clear. Otherwise, we could have just continued to use length, and just cast > it on the fly to unsigned or to size_t when calling `UNICODE::as_utf8`. This is exactly the case I was referring to. The declaration here is: template<typename T> static char* as_utf8(const T* base, size_t& length); whether length is an IN/OUT parameter that is the int array length going in (hence >= 0 and <= INT_MAX), and the size_t utf8 sequence length coming out. The out coming utf8 length can theoretically by > INT_MAX but if that were the case in this code (which expects to be dealing with names that can be symbols hence < 64K) then that would be a programming error which the checked_cast would catch. And of course new_symbol checks for < 64K. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20560#issuecomment-2317534674