[PATCH] D108308: [WIP] Cleanup identifier parsing.

Aaron Ballman via Phabricator via cfe-commits Wed, 25 Aug 2021 12:18:09 -0700

aaron.ballman added a reviewer: rsmith.
aaron.ballman added a subscriber: rsmith.
aaron.ballman added a comment.


In general, I'm in favor of these changes. They help identify (pun *totally* 
intended) where we're improperly expecting ASCII identifiers in places, which 
can hopefully be addressed in follow-up work. @rsmith, do you have any concerns 
with this direction?

Can you remove the [WIP] from the title so it's clear that this is no longer in 
progress? Also, I'd recommend slapping an NFC in the title somewhere to make it 
clear there's no functional changes intended.



================
Comment at: clang/include/clang/Lex/Lexer.h:702
   // Helper functions to lex the remainder of a token of the specific type.
-  bool LexIdentifier         (Token &Result, const char *CurPtr);
+  bool LexIdentifierContinue(Token &Result, const char *CurPtr);
   bool LexNumericConstant    (Token &Result, const char *CurPtr);
----------------
Should this be `LexUnicodeIdentifierContinue()`? If so, perhaps it can also be 
moved up to line 578 so it's near the "start" function?

Or does this function handle both Unicode and ASCII identifiers? If so, the 
comments could probably be updated.


================
Comment at: clang/lib/Lex/Lexer.cpp:1758
+bool Lexer::LexIdentifierContinue(Token &Result, const char *CurPtr) {
+  // Match [_A-Za-z0-9]*, we have already matched [_A-Za-z$]
+  unsigned Size;
----------------
Is the comment here still accurate? Might be worth rewriting in prose rather 
than regex?


================
Comment at: clang/lib/Lex/Lexer.cpp:1762
+    unsigned char C = *CurPtr;
+    // Fast path
+    if (isAsciiIdentifierContinue(C)) {
----------------



================
Comment at: clang/lib/Lex/Lexer.cpp:1767
+    }
+    // Slow path: handle trigraph, unicode codepoints, UCNs
+    C = getCharAndSize(CurPtr, Size);
----------------



================
Comment at: clang/lib/Lex/Lexer.cpp:1783-1788
+    if (C == '\\' && tryConsumeIdentifierUCN(CurPtr, Size, Result)) {
       continue;
-    } else if (!isASCII(C) && tryConsumeIdentifierUTF8Char(CurPtr)) {
-      C = getCharAndSize(CurPtr, Size);
+    }
+    if (!isASCII(C) && tryConsumeIdentifierUTF8Char(CurPtr)) {
       continue;
     }
----------------



================
Comment at: clang/lib/Lex/Lexer.cpp:1789
     }
+    // Neither an expected unicode codepoint nor a UCN
+    break;
----------------



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108308/new/

https://reviews.llvm.org/D108308

_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D108308: [WIP] Cleanup identifier parsing.

Reply via email to