On Wed, Aug 17, 2022 at 04:47:19PM -0400, Jason Merrill via Gcc-patches wrote: > > + length = 32; > > /* Magic value to indicate no digits seen. */
Indeed, will add the comment. > > + delimited = true; > > + if (loc_reader) > > + char_range->m_finish = loc_reader->get_next ().m_finish; > > + } > > + } > > else if (str[-1] == 'U') > > length = 8; > > else > > @@ -1107,6 +1118,8 @@ _cpp_valid_ucn (cpp_reader *pfile, const > > result = 0; > > do > > { > > + if (str == limit) > > + break; > > c = *str; > > if (!ISXDIGIT (c)) > > break; > > @@ -1116,9 +1129,41 @@ _cpp_valid_ucn (cpp_reader *pfile, const > > gcc_assert (char_range); > > char_range->m_finish = loc_reader->get_next ().m_finish; > > } > > + if (delimited) > > + { > > + if (!result) > > + /* Accept arbitrary number of leading zeros. */ > > + length = 16; > > + else if (length == 8) > > + { > > + /* Make sure we detect overflows. */ > > + result |= 0x8000000; > > + ++length; > > + } > > 16 above so that this case happens after we read 8 digits after leading > zeroes? Another magic value less than the no digits seen one and > 8, so that it can count 8 digits with the first non-zero one after which to or in the overflow flag. The intent is not to break the loop if there are further digits, just that there will be overflow. Another option would be those overflow |= n ^ (n << 4 >> 4); tests that convert_hex does and just making sure length is never decremented (except we need a way to distinguish between \u{} and at least one digit). > > + if (loc_reader) > > + char_range->m_finish = loc_reader->get_next ().m_finish; > > Here and in other functions, the pattern of increment the input pointer and > update m_finish seems like it should be a macro? Perhaps or inline function. Before my patch, there are 5 such ifs (some with char_range.m_finish and others char_range->m_finish), the patch adds another 7 such spots. > > @@ -2119,15 +2255,23 @@ _cpp_interpret_identifier (cpp_reader *p > > cppchar_t value = 0; > > size_t bufleft = len - (bufp - buf); > > int rval; > > + bool delimited = false; > > idp += 2; > > + if (length == 4 && id[idp] == '{') > > + { > > + delimited = true; > > + idp++; > > + } > > while (length && idp < len && ISXDIGIT (id[idp])) > > { > > value = (value << 4) + hex_value (id[idp]); > > idp++; > > - length--; > > + if (!delimited) > > + length--; > > } > > - idp--; > > + if (!delimited) > > + idp--; > > Don't we need to check that the first non-xdigit is a }? The comments and my understanding of the code say that we first check what is a valid identifier and the above is only called on a valid identifier. So, if it would be delimited \u{ not terminated with }, then it would fail forms_identifier_p and wouldn't be included in the range. Thus e.g. the ISXDIGIT (id[id]) test is probably not needed unless delimited is true because we've checked earlier that it has 4 or 8 hex digits. But sure, if you want a id[idp] == '}' test or assertion, it can be added. Jakub