https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224
--- Comment #7 from Eric <ejolson at unr dot edu> ---
Please look at the Raspberry Pi forum post linked in the original report for
more information about testing this patch. As the text describes there, the
command line options
-finput-charset=UTF-8 -fextended-identifiers
are both needed in order to compile a UTF-8 input file containing unicode
identifiers. I have included a small test program as another attachment.
Searching on UTF-8 Identifiers in GCC will turn up a number of people asking
for this feature and additional example codes that use UTF-8 identifers. The
document "Unicode for the PCC C99 Compiler" available at
http://pcc.ludd.ltu.se/documentation/
also contains example UTF-8 C99 input files which can be used to test the
compiler. The one-line patch submitted above has also been tested in the sense
that the compiler still bootstraps and has no trouble compiling thousands of
lines of standard ASCII C input.
The patch inserts "C99" in only one place as the uses of SOURCE_CHARSET are
conflicted and changing them all to "C99" doesn't yield a working solution. In
particular, the "C99" in _cpp_convert_input should not be considered the source
character set appearing in the input files but rather an internal character set
suitable for later parsing. As iconv is already a well debugged library, it
would appear the risks of this patch are minor.
Note however, the following problem: "C99" is probably not the correct for
EBCDIC hosts. In that case it might be possible to write UCNs using trigraphs
of the form ??/uXXXX and ??/UXXXXXXXX, however, as the number of people wanting
to compile C source files with identifiers encoded using UTF-EBCDIC is likely
zero, the easiest solution going forward is to modify the patch so it only
applies to non-EBCDIC hosts. As there are already #ifdef's in the code to
check for this, this does not add any new complexity to the code base.