------- Additional Comments From neil at daikokuya dot co dot uk 2005-02-21 23:00 ------- Subject: Re: UCNs not recognized in identifiers (c++/c99)
jsm28 at gcc dot gnu dot org wrote:- > * The greedy algorithm applies for lexing UCNs: for example, > a\U0000000z is three preprocessing tokens {a}{\}{U0000000z} (and > shouldn't get a diagnostic on lexing, presuming macros are defined > such that the eventual token sequence is valid). I'm not sure I agree with this: it would seem to be unnecessary extra work; further I suspect the user would benefit from it being pointed out he entered an ill-formed UCN rather than something random from the front end complaining about an unexpected backslash. The only case where you wouldn't get a syntax error from the front end, or an invalid escape in a literal, is with -E. I'm not sure lexing to the letter of the standard is worthwhile in this case, as the standard doesn't discuss -E. If you have an example where a compiled program is acceptable with multiple lexing tokens then I would agree with you. > * The spelling of UCNs is preserved for the # and ## operators. This is very hard with CPP's current implementation - it assumes it can deduce the spelling of an identifier from its hash table entry. IMO the proper way to fix this to use a different approach entirely, rather than kludge it in the existing implementation (which would bloat some common datastructures) but that's some work. > * I think the only reasonable interpretation of the lexing rules in > the context of forbidden characters is that first identifiers are > lexed (allowing any UCNs) then bad characters yield an error (rather > than stopping the identifier before the bad character and treating it > as not a UCN). Agreed - as I say above I don't see why this shouldn't apply for partial UCNs too, even with -E. The rest seems reasonable. Neil. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9449