------- Additional Comments From geoffk at geoffk dot org 2005-09-16 00:01 ------- Subject: Re: UCNs not recognized in identifiers (c++/c99)
On 15/09/2005, at 3:53 PM, joseph at codesourcery dot com wrote: > Yes, "spelling" is meant in terms of the source code characters. > The idea is to permit simple strcmp-like checking by the > preprocessor. Good, so that answers that question. You raise a good point about GCC not having documentation for phase 1. I don't have time to write all of it, but I think I can write the last part, about UCNs, so maybe together we can get it all done. My proposed wording is: @cite{The mapping between physical source file multibyte characters and the source character set in translation phase 1 (C90 and C99 5.1.1.2).} [CR/NL/CR-NL are turned into EOL markers, spaces are deleted between backslash and the end of a line, it's converted to UTF-8 using iconv based on -finput-charset---and what else?] Then, any character sequence which would form a UCN in an identifier in phase 3 of translation is converted into the corresponding UTF-8 sequence. Any backslash-newline combinations in the UCN are preserved and placed after the UTF-8 sequence. [note that there's no way for a user to tell whether a backslash- newline combination is placed before, in the middle of, or after, the UTF-8 sequence.] ... @cite{Which additional multibyte characters may appear in identifiers and their correspondence to universal character names (C99 6.4.2).} UTF-8 character sequences may appear in identifiers, and they correspond to the UCN that specifies that character. A UTF-8 sequence may appear only if the UCN that it corresponds to would be permitted in the identifier at that point. At present, only those UTF-8 sequences which were produced by the mapping from UCNs to UTF-8 sequences in translation phase 1 are permitted, but this is likely to change in the future. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9449