[PATCH] D124996: [clang][preprocessor] Fix unsigned-ness of utf8 char literals

Tom Honermann via Phabricator via cfe-commits Mon, 09 May 2022 09:52:44 -0700

tahonermann added inline comments.


================
Comment at: clang/test/Lexer/utf8-char-literal.cpp:5-7
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++17 -fsyntax-only 
-fchar8_t -DCHAR8_T -verify %s
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++20 -fsyntax-only 
-verify %s
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++20 -fsyntax-only 
-fno-char8_t -DNO_CHAR8_T -verify %s
----------------
The `-DCHAR8_T` and `-DNO_CHAR8_T` options can be removed now since the test is 
no longer dependent on them.


================
Comment at: clang/test/Lexer/utf8-char-literal.cpp:35-60
+#if __cplusplus == 201703L
+#  if defined(__cpp_char8_t)
+#    if u8'\xff' == '\xff' // expected-warning {{right side of operator 
converted from negative value to unsigned}}
+#      error Something's not right.
+#    endif
+#  else
+#    if u8'\xff' != '\xff'
----------------
Something is still not right here. `-std=c++17` should behave the same as 
`-std=c++20 -fno-char8_t`. Likewise, `-std=c++17 -fchar8_t` should behave the 
same as `-std=c++20`. Basically, if `__cpp_char8_t` is defined, then `u8'\xff'` 
should be unsigned and if that macro isn't defined, then the result should be 
signed (for an 8-bit signed char type). But the reversed conditions for the 
C++17 vs C++20 tests above conflict with that expectation. The code changes 
look right. Is the test actually passing like this?

Since the tests are now conditioned on `__cpp_char8_t`, I think they should be 
merged. I suggest:
  // UTF-8 character literals are enabled in C++17 and later. If `-fchar8_t` is 
not enabled
  // (as is the case in C++17), then UTF-8 character literals may produce 
signed or
  // unsigned values depending on whether char is a signed type. If `-fchar8_t` 
is enabled
  // (which is the default behavior for C++20), then UTF-8 character literals 
always
  // produce unsigned values. The tests below depend on the target having a 
signed
  // 8-bit char so that '\xff' produces a negative value.
  #if __cplusplus >= 201703L
  #  if !defined(__cpp_char8_t)
  #    if !(u8'\xff' == '\xff')
  #      error UTF-8 character value did not match ordinary character literal; 
this is unexpected
  #    endif
  #  else
  #    if u8'\xff' == '\xff' // expected-warning {{right side of operator 
converted from negative value to unsigned}}
  #      error UTF-8 character value matched ordinary character literal; this 
is unexpected
  #    endif
  #  endif
  #endif


================
Comment at: clang/test/Lexer/utf8-char-literal.cpp:37-47
+#if __cplusplus == 201703L
+#  if defined(CHAR8_T)
+#    if u8'\xff' == '\xff' // expected-warning {{right side of operator 
converted from negative value to unsigned}}
+#      error Something's not right.
+#    endif
+#  else
+#    if u8'\xff' != '\xff'
----------------
tbaeder wrote:
> tahonermann wrote:
> > aaron.ballman wrote:
> > > The equality operators seem backwards to what @tahonermann was saying -- 
> > > I read his comment as:
> > > 
> > > C++17/14/11: u8'\xff' == '\xff'
> > > C++17/14/11, -fchar8_t: u8'\xff' != '\xff'
> > > C++20 and up: u8'\xff' != '\xff'
> > > C++20 and up, -fno-char8_t: u8'\xff' == '\xff'
> > > 
> > > Hopefully Tom can clarify if I misunderstood.
> > Yes, that looks right (as long as the target has a signed `char` type).
> Are you listing the positive conditions (that should be true) or negative 
> ones? The conditions in the test case need to be false in order for the test 
> case to succeed.
Yes, these list the expected behavior (e.g., the assertable behavior).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124996/new/

https://reviews.llvm.org/D124996

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D124996: [clang][preprocessor] Fix unsigned-ness of utf8 char literals

Reply via email to