Command line options, in particular -D options, should be interpreted in
the locale character set (maybe subject to -finput-charset override).
Instead, the expansion of a -D option is not subject to character set
translation at present.

Consider the program

char *s = S;

compiled with the following command with LC_CTYPE=en_GB.ISO-8859-1

gcc -S -finput-charset=ISO-8859-1 -fexec-charset=UTF-8 -DS=\"§\" t.c

- the string in the output program consists of a single byte rather than
being translated to UTF-8.  But the similar program, encoded in
ISO-8859-1

char *s = "§";

compiled with the same options, in the same locale, has a properly UTF-8
string in the assembly output.

If we get extended identifiers (bug 9449) then the same will apply to the
macro names and parameter names in -D and -U options, not just their expansions.
I think the -D and -U arguments should just have the same character set
translations applied as are done to source files - including for C++,
when it is implemented for source files, the conversion of extended characters
to UCNs in phase 1.

-- 
           Summary: -D option handling doesn't account for character sets
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: preprocessor
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jsm28 at gcc dot gnu dot org
                CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20183

Reply via email to