cor3ntin added a comment.

In D104975#2842425 <https://reviews.llvm.org/D104975#2842425>, @jfb wrote:

> It would be more user friendly to say which character is not allowed in the 
> diagnostic.

Agreed

> Do we need to have a backwards compatible flag to preserve the old behavior, 
> or do we believe that nobody will be affected by the change? We should make 
> this choice explicitly and state it in the patch description.

I have been wondering about that.
I came to the conclusion it would probably not be worth it. Until fairly 
recently Unicode identifiers in GCC were not really usable and therefore not 
used afaik.
I haven't seen people use emojis or other interesting symbol in non toy code.

I'll try to make that clearer in the commit message



================
Comment at: clang/lib/Lex/Lexer.cpp:1462
+          static const llvm::sys::UnicodeCharSet XIDStartChars(XIDStartRanges);
+          return C == '_' || XIDStartChars.contains(C);
+  } else if (LangOpts.C11) {
----------------
Quuxplusone wrote:
> This is overly indented (or your editor snuck in some hard tabs).
> Why is `'_'` treated specially, here and on line 1444 above? Shouldn't it 
> just be included in the `XIDStartChars` set?
This is intentional
C++ treats `_` as valid in identifiers, Unicode doesn't. Putting it in the 
array with the other characters is a sure-fire way to have someone drop it



================
Comment at: clang/lib/Lex/Lexer.cpp:1503-1511
-  // Check C++98 compatibility.
-  if (!Diags.isIgnored(diag::warn_cxx98_compat_unicode_id, Range.getBegin())) {
-    static const llvm::sys::UnicodeCharSet CXX03AllowedIDChars(
-        CXX03AllowedIDCharRanges);
-    if (!CXX03AllowedIDChars.contains(C)) {
-      Diags.Report(Range.getBegin(), diag::warn_cxx98_compat_unicode_id)
-        << Range;
----------------
Quuxplusone wrote:
> Why remove this diagnostic? This looks unintentional.
> 
> But if it is intentional, then besides justifying it in the commit message 
> (and splitting out this removal into a separate PR), you should remove the 
> enumerator from `DiagnosticLexKinds.td`.
The diagnostic should be removed indeed.
Because the PR applies the paper as a DR to all versions of C++, it is no 
longer useful to diagnostic changes between c++ versions


================
Comment at: clang/test/Lexer/unicode.c:31
+extern int 🌹; // expected-error {{character not allowed in identifier}} 
expected-warning {{declaration does not declare anything}}
+int v=[=](auto){return~x;}();;  // expected-error 12{{character not allowed in 
identifier}} expected-error {{expected ';'}}
+// expected-error@-1 {{expected unqualified-id}}
----------------
Quuxplusone wrote:
> Why 12 errors? Are these `{` `(` etc. characters not the ASCII versions? If 
> so, this test is needlessly confusing, and should be rewritten with one 
> character per test case, as in lines 43-53 below. (Except line 46, which is 
> also bad. :))
> 
> I don't understand why you're disabling the existing tests. In particular, 
> line 43 tests the error message for GREEK QUESTION MARK, which is important 
> to preserve.
Having 2 sets of completely different diagnostics (for C++ and C) proved a lot 
more complicated than doing two separate tests.
The `GREEK QUESTION MARK` is no longer important to preserve - it tests for 
confusing characters - which cannot happen in C++.
I agree that the test above is not super meaningful.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104975/new/

https://reviews.llvm.org/D104975

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to