cor3ntin added inline comments.

================
Comment at: clang/lib/Sema/SemaDeclCXX.cpp:16899
+          uint32_t CodePoint = 
static_cast<uint32_t>(V.getInt().getZExtValue());
+          PrintCharLiteralPrefix(BTy->getKind(), OS);
+          OS << '\'';
----------------
hubert.reinterpretcast wrote:
> hubert.reinterpretcast wrote:
> > cor3ntin wrote:
> > > Looking at the diagnostics, I don't think it makes sense to print a 
> > > prefix here. You could just leave that part out.
> > Why is removing the prefix better? The types can matter (characters outside 
> > the basic character set are allowed to have negative `char` values). Also, 
> > moving forward, the value of a character need not be the same in the 
> > various encodings.
> Some fun with signedness (imagine a more realistic example with `ISO-8859-1` 
> ordinary character encoding with a signed `char` type):
> ```
> $ clang -Xclang -fwchar-type=short -xc++ -<<<$'static_assert(L"\\uFF10"[0] == 
> U\'\\uFF10\');'
> <stdin>:1:15: error: static assertion failed due to requirement 'L"\xFF10"[0] 
> == U'\uff10''
>     1 | static_assert(L"\uFF10"[0] == U'\uFF10');
>       |               ^~~~~~~~~~~~~~~~~~~~~~~~~
> <stdin>:1:28: note: expression evaluates to ''0' (0xFF10) == '0' (0xFF10)'
>     1 | static_assert(L"\uFF10"[0] == U'\uFF10');
>       |               ~~~~~~~~~~~~~^~~~~~~~~~~~
> 1 error generated.
> Return:  0x01:1   Fri Aug 11 23:49:02 2023 EDT
> ```
Either we care about the actual character - ie `'a'`, or it's value (ie `42`). 
The motivation for the current patch is to add the value to the diagnostic 
message.
I'm also concerned about mixing things that are are are not lexical elements in 
the diagnostics


================
Comment at: clang/lib/Sema/SemaDeclCXX.cpp:16930-16936
+        case BuiltinType::Char_S:
+        case BuiltinType::Char_U:
+        case BuiltinType::Char8:
+        case BuiltinType::Char16:
+        case BuiltinType::Char32:
+        case BuiltinType::WChar_S:
+        case BuiltinType::WChar_U: {
----------------
hubert.reinterpretcast wrote:
> Add FIXME for `char` and `wchar_t` cases that this assumes Unicode literal 
> encodings.
If we wanted such fixme, it should be L1689.


================
Comment at: clang/test/SemaCXX/static-assert.cpp:287
+  static_assert((char16_t)L'ゆ' == L"C̵̭̯̠̎͌ͅť̺"[1], ""); // expected-error 
{{failed}} \
+                                                  // expected-note {{evaluates 
to ''ゆ' (0x3086) == '̵' (0x335)'}}
+  static_assert(L"\/"[1] == u'\xFFFD', ""); // expected-error {{failed}} \
----------------
hubert.reinterpretcast wrote:
> The C++23 escaped string formatting facility would not generate a trailing 
> combining character like this. I recommend following suit.
> 
> Info on U+0335: https://util.unicode.org/UnicodeJsps/character.jsp?a=0335
> 
This is way outside the scope of the patch. The diagnostic output facility has 
no understanding of combining characters or graphemes and do not attempt to 
match std::print. It probably would be an improvement but this patch is not 
trying to modify how all diagnostics are printed. (all of that logic is in 
Diagnostic.cpp)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155610/new/

https://reviews.llvm.org/D155610

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to