On Thu, 29 Aug 2024 13:06:19 GMT, Thomas Stuefe <stu...@openjdk.org> wrote:
>>> > > Many of these translations seem awkward, since they convert to size_t >>> > > only to then convert back to int. >>> > >>> > Can you be more specific here please? >>> >>> Certainly. For example, this construct: >>> >>> ``` >>> size_t utf8_len = static_cast<size_t>(length); >>> const char* base = UNICODE::as_utf8(position, utf8_len); >>> Symbol* sym = SymbolTable::new_symbol(base, >>> checked_cast<int>(utf8_len)); >>> ``` >>> >>> We introduce `utf8_len` as a `size_t` synonym for len, but since it >>> originates from an `int`, its length must be <= INT_MIN. We also assume it >>> is >=0, but we don't check. We feed `utf8_len` into both `UNICODE::as_utf8` >>> and `SymbolTable::new_symbol`. The former takes a size_t, but since we rely >>> on `length` >= 0, we could just as well give it `length`. For >>> `SymbolTable::new_symbol`, we translate `utf8_len` back to `int`, with a >>> check. The check feels superfluous since `utf8_len` came from an int. >>> >>> I assume this verbosity is for the benefit of the code reader, to make >>> intent clear. Otherwise, we could have just continued to use length, and >>> just cast it on the fly to unsigned or to size_t when calling >>> `UNICODE::as_utf8`. >> >> This is exactly the case I was referring to. The declaration here is: >> >> template<typename T> static char* as_utf8(const T* base, size_t& length); >> >> whether length is an IN/OUT parameter that is the int array length going in >> (hence >= 0 and <= INT_MAX), and the size_t utf8 sequence length coming out. >> The out coming utf8 length can theoretically by > INT_MAX but if that were >> the case in this code (which expects to be dealing with names that can be >> symbols hence < 64K) then that would be a programming error which the >> checked_cast would catch. And of course new_symbol checks for < 64K. > >> > > > Many of these translations seem awkward, since they convert to size_t >> > > > only to then convert back to int. >> > > >> > > >> > > Can you be more specific here please? >> > >> > >> > Certainly. For example, this construct: >> > ``` >> > size_t utf8_len = static_cast<size_t>(length); >> > const char* base = UNICODE::as_utf8(position, utf8_len); >> > Symbol* sym = SymbolTable::new_symbol(base, >> > checked_cast<int>(utf8_len)); >> > ``` >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > We introduce `utf8_len` as a `size_t` synonym for len, but since it >> > originates from an `int`, its length must be <= INT_MIN. We also assume it >> > is >=0, but we don't check. We feed `utf8_len` into both >> > `UNICODE::as_utf8` and `SymbolTable::new_symbol`. The former takes a >> > size_t, but since we rely on `length` >= 0, we could just as well give it >> > `length`. For `SymbolTable::new_symbol`, we translate `utf8_len` back to >> > `int`, with a check. The check feels superfluous since `utf8_len` came >> > from an int. >> > I assume this verbosity is for the benefit of the code reader, to make >> > intent clear. Otherwise, we could have just continued to use length, and >> > just cast it on the fly to unsigned or to size_t when calling >> > `UNICODE::as_utf8`. >> >> This is exactly the case I was referring to. The declaration here is: >> >> ``` >> template<typename T> static char* as_utf8(const T* base, size_t& length); >> ``` >> >> whether length is an IN/OUT parameter that is the int array length going in >> (hence >= 0 and <= INT_MAX), and the size_t utf8 sequence length coming out. >> The out coming utf8 length can theoretically by > INT_MAX but if that were >> the case in this code (which expects to be dealing with names that can be >> symbols hence < 64K) then that would be a programming error which the >> checked_cast would catch. And of course new_symbol checks for < 64K. > > Oh, I completely missed that! > > I wish we would use pointers instead of references in cases like these since > that would make the intent immediately clear when looking at the call site. Thanks for the review and discussion @tstuefe . ------------- PR Comment: https://git.openjdk.org/jdk/pull/20560#issuecomment-2318932417