On 8/17/22 14:19, Jakub Jelinek wrote:
On Wed, Aug 17, 2022 at 04:47:19PM -0400, Jason Merrill via Gcc-patches wrote:
+         length = 32;

/* Magic value to indicate no digits seen.  */

Indeed, will add the comment.

+         delimited = true;
+         if (loc_reader)
+           char_range->m_finish = loc_reader->get_next ().m_finish;
+       }
+    }
     else if (str[-1] == 'U')
       length = 8;
     else
@@ -1107,6 +1118,8 @@ _cpp_valid_ucn (cpp_reader *pfile, const
     result = 0;
     do
       {
+      if (str == limit)
+       break;
         c = *str;
         if (!ISXDIGIT (c))
        break;
@@ -1116,9 +1129,41 @@ _cpp_valid_ucn (cpp_reader *pfile, const
          gcc_assert (char_range);
          char_range->m_finish = loc_reader->get_next ().m_finish;
        }
+      if (delimited)
+       {
+         if (!result)
+           /* Accept arbitrary number of leading zeros.  */
+           length = 16;
+         else if (length == 8)
+           {
+             /* Make sure we detect overflows.  */
+             result |= 0x8000000;
+             ++length;
+           }

16 above so that this case happens after we read 8 digits after leading
zeroes?

Another magic value less than the no digits seen one and >8,
so that it can count 8 digits with the first non-zero one after
which to or in the overflow flag.  The intent is not to break the loop
if there are further digits, just that there will be overflow.
Another option would be those overflow |= n ^ (n << 4 >> 4);
tests that convert_hex does and just making sure length is never decremented
(except we need a way to distinguish between \u{} and at least one digit).

This way is fine, could just use more comment.

+      if (loc_reader)
+       char_range->m_finish = loc_reader->get_next ().m_finish;

Here and in other functions, the pattern of increment the input pointer and
update m_finish seems like it should be a macro?

Perhaps or inline function.  Before my patch, there are 5 such ifs
(some with char_range.m_finish and others char_range->m_finish),
the patch adds another 7 such spots.

Either way is fine.

@@ -2119,15 +2255,23 @@ _cpp_interpret_identifier (cpp_reader *p
        cppchar_t value = 0;
        size_t bufleft = len - (bufp - buf);
        int rval;
+       bool delimited = false;
        idp += 2;
+       if (length == 4 && id[idp] == '{')
+         {
+           delimited = true;
+           idp++;
+         }
        while (length && idp < len && ISXDIGIT (id[idp]))
          {
            value = (value << 4) + hex_value (id[idp]);
            idp++;
-           length--;
+           if (!delimited)
+             length--;
          }
-       idp--;
+       if (!delimited)
+         idp--;

Don't we need to check that the first non-xdigit is a }?

The comments and my understanding of the code say that we first
check what is a valid identifier and the above is only called on
a valid identifier.  So, if it would be delimited \u{ not terminated
with }, then it would fail forms_identifier_p and wouldn't be included
in the range.  Thus e.g. the ISXDIGIT (id[id]) test is probably not needed
unless delimited is true because we've checked earlier that it has 4 or 8
hex digits.
But sure, if you want a id[idp] == '}' test or assertion, it can be
added.

OK, a comment mentioning this should be sufficient.

Jason

Reply via email to