aaron.ballman added a reviewer: rsmith.
aaron.ballman added a subscriber: rsmith.
aaron.ballman added a comment.
In general, I'm in favor of these changes. They help identify (pun *totally*
intended) where we're improperly expecting ASCII identifiers in places, which
can hopefully be addressed in follow-up work. @rsmith, do you have any concerns
with this direction?
Can you remove the [WIP] from the title so it's clear that this is no longer in
progress? Also, I'd recommend slapping an NFC in the title somewhere to make it
clear there's no functional changes intended.
================
Comment at: clang/include/clang/Lex/Lexer.h:702
// Helper functions to lex the remainder of a token of the specific type.
- bool LexIdentifier (Token &Result, const char *CurPtr);
+ bool LexIdentifierContinue(Token &Result, const char *CurPtr);
bool LexNumericConstant (Token &Result, const char *CurPtr);
----------------
Should this be `LexUnicodeIdentifierContinue()`? If so, perhaps it can also be
moved up to line 578 so it's near the "start" function?
Or does this function handle both Unicode and ASCII identifiers? If so, the
comments could probably be updated.
================
Comment at: clang/lib/Lex/Lexer.cpp:1758
+bool Lexer::LexIdentifierContinue(Token &Result, const char *CurPtr) {
+ // Match [_A-Za-z0-9]*, we have already matched [_A-Za-z$]
+ unsigned Size;
----------------
Is the comment here still accurate? Might be worth rewriting in prose rather
than regex?
================
Comment at: clang/lib/Lex/Lexer.cpp:1762
+ unsigned char C = *CurPtr;
+ // Fast path
+ if (isAsciiIdentifierContinue(C)) {
----------------
================
Comment at: clang/lib/Lex/Lexer.cpp:1767
+ }
+ // Slow path: handle trigraph, unicode codepoints, UCNs
+ C = getCharAndSize(CurPtr, Size);
----------------
================
Comment at: clang/lib/Lex/Lexer.cpp:1783-1788
+ if (C == '\\' && tryConsumeIdentifierUCN(CurPtr, Size, Result)) {
continue;
- } else if (!isASCII(C) && tryConsumeIdentifierUTF8Char(CurPtr)) {
- C = getCharAndSize(CurPtr, Size);
+ }
+ if (!isASCII(C) && tryConsumeIdentifierUTF8Char(CurPtr)) {
continue;
}
----------------
================
Comment at: clang/lib/Lex/Lexer.cpp:1789
}
+ // Neither an expected unicode codepoint nor a UCN
+ break;
----------------
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D108308/new/
https://reviews.llvm.org/D108308
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits