[Bug c/67224] UTF-8 support for identifier names in GCC

joseph at codesourcery dot com Sat, 15 Aug 2015 05:26:09 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224


--- Comment #5 from joseph at codesourcery dot com <joseph at codesourcery dot 
com> ---
There is no "C99" character set in glibc libiconv (after all, it's not a 
character set at all).  Converting extended characters to UCNs like that 
would in any case be correct for C++ (provided you also convert $ ` @ and 
control characters other than those in the basic source character set) but 
not for C - but for C++, it would be necessary to keep track of the 
conversions to revert them in raw string literals.  This requirement to 
revert such conversions in raw string literals (in C++14, see 2.5 
[lex.pptoken] paragraph 3: "Between the initial and final double quote 
characters of the raw string, any transformations performed in phases 1 
and 2 (trigraphs, universal-character-names, and line splicing) are 
reverted; this reversion shall apply before any d-char, r-char, or 
delimiting parenthesis is identified.") renders such an approach 
non-viable (it would break things that currently work); the conversions to 
UCNs have to take place within cpplib, not through an external iconv 
conversion.

Note that cpplib identifier spelling preservation is now implemented 
<https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00548.html>, which adds 
other ways in which it should be visible whether an identifier was 
represented with UTF-8 or UCNs.

[Bug c/67224] UTF-8 support for identifier names in GCC

Reply via email to