cor3ntin added a comment. In D104975#2842425 <https://reviews.llvm.org/D104975#2842425>, @jfb wrote:
> It would be more user friendly to say which character is not allowed in the > diagnostic. Agreed > Do we need to have a backwards compatible flag to preserve the old behavior, > or do we believe that nobody will be affected by the change? We should make > this choice explicitly and state it in the patch description. I have been wondering about that. I came to the conclusion it would probably not be worth it. Until fairly recently Unicode identifiers in GCC were not really usable and therefore not used afaik. I haven't seen people use emojis or other interesting symbol in non toy code. I'll try to make that clearer in the commit message ================ Comment at: clang/lib/Lex/Lexer.cpp:1462 + static const llvm::sys::UnicodeCharSet XIDStartChars(XIDStartRanges); + return C == '_' || XIDStartChars.contains(C); + } else if (LangOpts.C11) { ---------------- Quuxplusone wrote: > This is overly indented (or your editor snuck in some hard tabs). > Why is `'_'` treated specially, here and on line 1444 above? Shouldn't it > just be included in the `XIDStartChars` set? This is intentional C++ treats `_` as valid in identifiers, Unicode doesn't. Putting it in the array with the other characters is a sure-fire way to have someone drop it ================ Comment at: clang/lib/Lex/Lexer.cpp:1503-1511 - // Check C++98 compatibility. - if (!Diags.isIgnored(diag::warn_cxx98_compat_unicode_id, Range.getBegin())) { - static const llvm::sys::UnicodeCharSet CXX03AllowedIDChars( - CXX03AllowedIDCharRanges); - if (!CXX03AllowedIDChars.contains(C)) { - Diags.Report(Range.getBegin(), diag::warn_cxx98_compat_unicode_id) - << Range; ---------------- Quuxplusone wrote: > Why remove this diagnostic? This looks unintentional. > > But if it is intentional, then besides justifying it in the commit message > (and splitting out this removal into a separate PR), you should remove the > enumerator from `DiagnosticLexKinds.td`. The diagnostic should be removed indeed. Because the PR applies the paper as a DR to all versions of C++, it is no longer useful to diagnostic changes between c++ versions ================ Comment at: clang/test/Lexer/unicode.c:31 +extern int 🌹; // expected-error {{character not allowed in identifier}} expected-warning {{declaration does not declare anything}} +int v=[=](auto){return~x;}();; // expected-error 12{{character not allowed in identifier}} expected-error {{expected ';'}} +// expected-error@-1 {{expected unqualified-id}} ---------------- Quuxplusone wrote: > Why 12 errors? Are these `{` `(` etc. characters not the ASCII versions? If > so, this test is needlessly confusing, and should be rewritten with one > character per test case, as in lines 43-53 below. (Except line 46, which is > also bad. :)) > > I don't understand why you're disabling the existing tests. In particular, > line 43 tests the error message for GREEK QUESTION MARK, which is important > to preserve. Having 2 sets of completely different diagnostics (for C++ and C) proved a lot more complicated than doing two separate tests. The `GREEK QUESTION MARK` is no longer important to preserve - it tests for confusing characters - which cannot happen in C++. I agree that the test above is not super meaningful. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D104975/new/ https://reviews.llvm.org/D104975 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits