------- Additional Comments From neil at daikokuya dot co dot uk  2005-02-21 
23:00 -------
Subject: Re:  UCNs not recognized in identifiers (c++/c99)

jsm28 at gcc dot gnu dot org wrote:-

> * The greedy algorithm applies for lexing UCNs: for example,
> a\U0000000z is three preprocessing tokens {a}{\}{U0000000z} (and
> shouldn't get a diagnostic on lexing, presuming macros are defined
> such that the eventual token sequence is valid).

I'm not sure I agree with this: it would seem to be unnecessary
extra work; further I suspect the user would benefit from it being
pointed out he entered an ill-formed UCN rather than something random
from the front end complaining about an unexpected backslash.

The only case where you wouldn't get a syntax error from the
front end, or an invalid escape in a literal, is with -E.  I'm
not sure lexing to the letter of the standard is worthwhile in
this case, as the standard doesn't discuss -E.

If you have an example where a compiled program is acceptable
with multiple lexing tokens then I would agree with you.

> * The spelling of UCNs is preserved for the # and ## operators.

This is very hard with CPP's current implementation - it assumes
it can deduce the spelling of an identifier from its hash table
entry.  IMO the proper way to fix this to use a different approach
entirely, rather than kludge it in the existing implementation
(which would bloat some common datastructures) but that's some work.

> * I think the only reasonable interpretation of the lexing rules in
> the context of forbidden characters is that first identifiers are
> lexed (allowing any UCNs) then bad characters yield an error (rather
> than stopping the identifier before the bad character and treating it
> as not a UCN).

Agreed - as I say above I don't see why this shouldn't apply for
partial UCNs too, even with -E.
 
The rest seems reasonable.

Neil.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9449

Reply via email to