------- Additional Comments From joseph at codesourcery dot com 2004-12-16 12:33 ------- Subject: Re: UCNs not recognized in identifiers (c++/c99)
On Thu, 16 Dec 2004, zack at codesourcery dot com wrote: > Because of the ABI implications, I consider it completely unacceptable Which ABI implications? (a) It isn't explicitly stated that different UCNs designating the same character are equivalent to each other (and to that character) in identifiers, but I don't think there's any real doubt that they are meant to be equivalent. (b) There is no normalisation, but I'm confident that the answer from WG14 if this is queried would be that the standard is correct and by design it normatively references ISO 10646 (not Unicode) which doesn't include the normalisation definitions of UAX 15 and implementation of the standard is not meant to involve large external tables. If there are cases of ambiguity a -Wnfc option (default on) to warn for identifiers not in NFC (or indeed -Wnfkc, default on, for identifiers not in NFKC) would draw users' attention to doubtful identifiers. (TR 10176 expressly notes the problems of ambiguity of appearance of entirely different characters even without combining characters, says that language standards need not provide for normalisation if they allow combining characters, and excludes most combining characters where precombined characters are available for the specific purpose of avoiding alternate representations of identifiers.) (c) Though we could do what we want with extended characters (as opposed to UCNs) in source files in phase 1, it seems safest to err on the side of rejecting all extended characters that wouldn't be accepted as UCNs, rather than e.g. applying NFC, to avoid giving identifiers with such characters a meaning which might then need to be preserved in future. (d) There are genuine ABI issues with how extended characters are represented in object files, but I think those need to be resolved by selecting between UTF-8 and mangling (default UTF-8) based on target configurations rather than on the capabilities of the assembler and linker in use, and by getting an explicit statement about encoding put in the ELF specification. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9449