------- Additional Comments From joseph at codesourcery dot com  2004-12-16 
12:33 -------
Subject: Re:  UCNs not recognized in identifiers
 (c++/c99)

On Thu, 16 Dec 2004, zack at codesourcery dot com wrote:

> Because of the ABI implications, I consider it completely unacceptable

Which ABI implications?

(a) It isn't explicitly stated that different UCNs designating the same 
character are equivalent to each other (and to that character) in 
identifiers, but I don't think there's any real doubt that they are meant 
to be equivalent.

(b) There is no normalisation, but I'm confident that the answer from WG14 
if this is queried would be that the standard is correct and by design it 
normatively references ISO 10646 (not Unicode) which doesn't include the 
normalisation definitions of UAX 15 and implementation of the standard is 
not meant to involve large external tables.  If there are cases of 
ambiguity a -Wnfc option (default on) to warn for identifiers not in NFC 
(or indeed -Wnfkc, default on, for identifiers not in NFKC) would draw 
users' attention to doubtful identifiers.  (TR 10176 expressly notes the 
problems of ambiguity of appearance of entirely different characters even 
without combining characters, says that language standards need not 
provide for normalisation if they allow combining characters, and excludes 
most combining characters where precombined characters are available for 
the specific purpose of avoiding alternate representations of 
identifiers.)

(c) Though we could do what we want with extended characters (as opposed 
to UCNs) in source files in phase 1, it seems safest to err on the side of 
rejecting all extended characters that wouldn't be accepted as UCNs, 
rather than e.g. applying NFC, to avoid giving identifiers with such 
characters a meaning which might then need to be preserved in future.

(d) There are genuine ABI issues with how extended characters are 
represented in object files, but I think those need to be resolved by 
selecting between UTF-8 and mangling (default UTF-8) based on target 
configurations rather than on the capabilities of the assembler and linker 
in use, and by getting an explicit statement about encoding put in the ELF 
specification.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9449

Reply via email to