------- Additional Comments From zack at codesourcery dot com 2004-12-16 02:16 ------- Subject: Re: gcc and UCN in identifiers: bug PR 9449
Al Simons <[EMAIL PROTECTED]> writes: > Hi, Zack. > > I'm looking into adding UCN support for identifiers into the HP > C/C++ compiler, and wondered if there is any new status on your > implementation / design? We'd like to do things the same way if at > all possible. I don't intend to implement this feature until the C committee, the C++ committee, and the Unicode committee all agree on which Unicode character sequences are legitimate in identifiers and what sort of canonicalization is to be performed. As long as there is no agreement, implementation of this feature risks indeterminacy in shared library ABIs. Suppose that the identifier "get_length_in_Ångstroms" is part of a shared library's public interface. The Å might be U+212B, U+00C5, or U+0041 U+030A. Suppose further that the person who implemented the shared library used a text editor that generates NFD, so the library header reads U+0041 U+030A. But their compiler normalizes to NFC on input, so the name in the shared library's symbol table reads U+00C5. Now someone comes along with a compiler that does no normalization whatsoever and tries to use the library. They're going to get a link error and they're not going to know why. Worse, if someone recompiles the library with a compiler that chose to normalize to NFD, its ABI silently changes. Joseph Myers insists that this situation cannot arise, because C99/C++'s lists of valid Unicode code points in identifiers exclude all combining forms. But if I enforce those rules users will hate the compiler, because their text editors will generate what looks like perfectly fine text and then the compiler will barf on it. And I am not prepared to trust that every editor on the planet will adhere to C99/C++'s rules. And even if I were, we'd still have the problem of the C99 and C++ lists not being identical. > There is a link in the bug report that appears to be broken; any > chance you can hook it back up? > > <<http://www.codesourcery.com/lists?2:mss:1481:danfdfbkjoaahbcmmeam>http://www.codesourcery.com/lists?2:mss:1481:danfdfbkjoaahbcmmeam> My best guess is that this is now <http://www.codesourcery.com/archives/cxx-abi-dev/msg00676.html>. This is mostly about how to mangle non-ASCII characters in identifiers to get them past limited linkers, and doesn't offer any help with the problems I described above. zw -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9449
