C99, but not C11, C++98, C++03 or C++11, disallows universal character names for digits starting identifiers. The cpplib logic for this gets the "digit" property from Unicode data, but that data disagrees with C99 Annex D, which considers Roman numerals (2160-2182), IDEOGRAPHIC NUMBER ZERO (3007) and Suzhou numerals (3021-3029) to be special characters instead of digits.
This patch fixes cpplib to follow C99's definition of digit. C++98/C++03 have no restrictions on initial characters. C11 and C++11 have identical list of permitted characters, and forbidden initial characters, different from the lists in C99 and C++98/C++03; this patch is preliminary to implementing support for the C11/C++11 lists. In those lists, the forbidden initial characters appear to be combining characters instead of digits. (So I'll probably change the C99, DIG, CXX flags in the followup to C99, N99 (meaning non-initial character in C99), CXX (i.e. C++98/C++03), C11, N11.) The new lists generally include large ranges of characters which may not all be allocated in a particular Unicode version (meaning it will be necessary to update the character composition information for -Wnormalized= from Unicode from time to time, whereas that hasn't mattered so much with the old smaller lists of characters). Bootstrapped with no regressions on x86_64-unknown-linux-gnu. Applied to mainline. gcc/testsuite: 2013-11-15 Joseph Myers <jos...@codesourcery.com> * gcc.dg/cpp/ucnid-9.c: New test. libcpp: 2013-11-15 Joseph Myers <jos...@codesourcery.com> * ucnid.tab: Mark C99 digits as [C99DIG]. * makeucnid.c (read_ucnid): Handle [C99DIG]. (read_table): Don't check for digit characters. * ucnid.h: Regenerate. Index: libcpp/makeucnid.c =================================================================== --- libcpp/makeucnid.c (revision 204827) +++ libcpp/makeucnid.c (working copy) @@ -66,6 +66,8 @@ read_ucnid (const char *fname) break; if (strcmp (line, "[C99]\n") == 0) fl = C99; + if (strcmp (line, "[C99DIG]\n") == 0) + fl = C99|digit; else if (strcmp (line, "[CXX]\n") == 0) fl = CXX; else if (isxdigit (line[0])) @@ -104,10 +106,10 @@ read_ucnid (const char *fname) fclose (f); } -/* Read UnicodeData.txt and set the 'digit' flag, and - also fill in the 'decomp' table to be the decompositions of - characters for which both the character decomposed and all the code - points in the decomposition are either C99 or CXX. */ +/* Read UnicodeData.txt and fill in the 'decomp' table to be the + decompositions of characters for which both the character + decomposed and all the code points in the decomposition are either + C99 or CXX. */ static void read_table (char *fname) @@ -135,11 +137,7 @@ read_table (char *fname) do { l++; } while (*l != ';'); - /* Category value; things starting with 'N' are numbers of some - kind. */ - if (*++l == 'N') - flags[codepoint] |= digit; - + /* Category value. */ do { l++; } while (*l != ';'); Index: libcpp/ucnid.h =================================================================== --- libcpp/ucnid.h (revision 204827) +++ libcpp/ucnid.h (working copy) @@ -714,13 +714,12 @@ { 0| 0| 0|CID|NFC|NKC| 0, 0, 0x2132 }, { C99| 0| 0|CID|NFC| 0| 0, 0, 0x2138 }, { 0| 0| 0|CID|NFC| 0| 0, 0, 0x215f }, -{ C99|DIG| 0|CID|NFC| 0| 0, 0, 0x217f }, -{ C99|DIG| 0|CID|NFC|NKC| 0, 0, 0x2182 }, +{ C99| 0| 0|CID|NFC| 0| 0, 0, 0x217f }, +{ C99| 0| 0|CID|NFC|NKC| 0, 0, 0x2182 }, { 0| 0| 0|CID|NFC|NKC| 0, 0, 0x3004 }, -{ C99| 0| 0|CID|NFC|NKC| 0, 0, 0x3006 }, -{ C99|DIG| 0|CID|NFC|NKC| 0, 0, 0x3007 }, +{ C99| 0| 0|CID|NFC|NKC| 0, 0, 0x3007 }, { 0| 0| 0|CID|NFC|NKC| 0, 0, 0x3020 }, -{ C99|DIG| 0|CID|NFC|NKC| 0, 0, 0x3029 }, +{ C99| 0| 0|CID|NFC|NKC| 0, 0, 0x3029 }, { 0| 0| 0|CID|NFC|NKC| 0, 0, 0x3040 }, { C99| 0|CXX|CID|NFC|NKC| 0, 0, 0x3093 }, { 0| 0|CXX|CID|NFC|NKC| 0, 0, 0x3094 }, Index: libcpp/ucnid.tab =================================================================== --- libcpp/ucnid.tab (revision 204827) +++ libcpp/ucnid.tab (working copy) @@ -119,7 +119,7 @@ ac00-d7a3 0b3d 1fbe 203f-2040 2102 2107 210a-2113 2115 2118-211d 2124 2126 2128 212a-2131 2133-2138 2160-2182 3005-3007 3021-3029 -; Digits +[C99DIG] 0660-0669 06f0-06f9 0966-096f 09e6-09ef 0a66-0a6f 0ae6-0aef 0b66-0b6f 0be7-0bef 0c66-0c6f 0ce6-0cef 0d66-0d6f 0e50-0e59 0ed0-0ed9 0f20-0f33 Index: gcc/testsuite/gcc.dg/cpp/ucnid-9.c =================================================================== --- gcc/testsuite/gcc.dg/cpp/ucnid-9.c (revision 0) +++ gcc/testsuite/gcc.dg/cpp/ucnid-9.c (revision 0) @@ -0,0 +1,8 @@ +/* { dg-do preprocess } */ +/* { dg-options "-std=c99 -pedantic -fextended-identifiers" } */ + +\u2160 +\u2182 +\u3007 +\u3021 +\u3029 -- Joseph S. Myers jos...@codesourcery.com