tahonermann accepted this revision as: tahonermann. tahonermann added a comment. This revision is now accepted and ready to land.
> @tahonermann gentle ping (Aaron told me you might have further comments) I'm sorry for the delay. I ran out of time to do the thorough review I would have liked to do, but I did scroll through everything now and did not find anything concerning; Aaron clearly conducted a thorough review already. It looks great to me, really nice work! ================ Comment at: clang/lib/Lex/Lexer.cpp:3255-3260 + if (!isAlphanumeric(C) && C != '_' && C != '-' && C != ' ') + break; + + if ((C < 'A' || C > 'Z') && !llvm::isDigit(C) && C != ' ' && C != '-') { + Invalid = true; + } ---------------- cor3ntin wrote: > tahonermann wrote: > > It isn't clear to me why there are two separate `if` statements here. I > > would expect the following to suffice. If I'm misunderstanding something, > > then comments might be helpful. > > if (!isUppercase(C) && !isDigit(C) && C != '-' && C != ' ') { > > Invalid = true; > > break; > > } > > > > I'm not sure why there is a test for '_' since that character is not > > present in the grammar for `n-char` in P2071. > > > > Is it intentional that there is no `break` statement in the second `if` > > statement? > I improved that, what it does should be more clear now. More importantly, I > added a diagnostic note when we detect a loose match. > > We allow `_` because Unicode does. > We first perform a strict match - which fails as no Unicode name contains an > underscore, we emit a diagnostic, and then we try a loose matching which does > allow `_`. > This enable us to produces better diagnostics > > ``` > <source>:2:18: error: 'GREEK_SMALL_LETTER-OMICRON' is not a valid Unicode > character name > const char* \N{GREEK_SMALL_LETTER-OMICRON} = "\N{zero width no break > space}"; > ^ > <source>:2:20: note: characters names in Unicode escape sequences are > sensitive to case and whitespaces > const char* \N{GREEK_SMALL_LETTER-OMICRON} = "\N{zero width no break > space}"; > ^~~~~~~~~~~~~~~~~~~~~~~~~~ > GREEK SMALL LETTER OMICRON > <source>:2:54: error: 'zero width no break space' is not a valid Unicode > character name > const char* \N{GREEK_SMALL_LETTER-OMICRON} = "\N{zero width no break > space}"; > ^~~~~~~~~~~~~~~~~~~~~~~~~ > <source>:2:54: note: characters names in Unicode escape sequences are > sensitive to case and whitespaces > const char* \N{GREEK_SMALL_LETTER-OMICRON} = "\N{zero width no break > space}"; > ^~~~~~~~~~~~~~~~~~~~~~~~~ > ZERO WIDTH NO-BREAK > SPACE``` > That's great! Very nice! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D123064/new/ https://reviews.llvm.org/D123064 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits