aaron.ballman added reviewers: tahonermann, clang-language-wg.
aaron.ballman added a comment.

Thanks for picking this back up! I have mostly nits, a few questions, and am 
adding some extra reviewers who may have an opinion. Also, I think the summary 
needs to be updated (this is no longer waiting on the unevaluated strings 
stuff), and the changes need a release note.



================
Comment at: clang/lib/Basic/Diagnostic.cpp:809
+  OutStr.reserve(OutStr.size() + Str.size());
+  const unsigned char *Begin =
+      reinterpret_cast<const unsigned char *>(Str.data());
----------------



================
Comment at: clang/lib/Basic/Diagnostic.cpp:821
+      llvm::UTF32 CodepointValue;
+      llvm::UTF32 *CpPtr = &CodepointValue;
+      const unsigned char *CodepointBegin = Begin;
----------------
We don't really need this variable, we can use `&CodepointValue` in the two 
places it's needed.


================
Comment at: clang/lib/Basic/Diagnostic.cpp:825
+          Begin + llvm::getNumBytesForUTF8(*Begin);
+      llvm::ConversionResult res = llvm::ConvertUTF8toUTF32(
+          &Begin, CodepointEnd, &CpPtr, CpPtr + 1, llvm::strictConversion);
----------------



================
Comment at: clang/lib/Basic/Diagnostic.cpp:828
+      (void)res;
+      assert(llvm::conversionOK == res);
+      assert(Begin == CodepointEnd &&
----------------
It'd be worth adding a message here as well. Something like `&& "the sequence 
is legal UTF-8 but we couldn't convert it to UTF-32"`


================
Comment at: clang/lib/Sema/SemaDeclCXX.cpp:16591
+      if (AssertMessage) {
+        StringLiteral *MsgStr = cast<StringLiteral>(AssertMessage);
+        if (MsgStr->getCharByteWidth() == 1)
----------------



================
Comment at: clang/lib/Sema/SemaDeclCXX.cpp:16592
+        StringLiteral *MsgStr = cast<StringLiteral>(AssertMessage);
+        if (MsgStr->getCharByteWidth() == 1)
+          Msg << MsgStr->getString();
----------------
Do we want to use something like `isASCII()` here instead of just checking for 
a single-byte width? e.g., pascal strings are single byte width, but probably 
not something we want printed this way.

Another question is, do we want to do something slightly slower (since we know 
we're already on the failure path, the performance here shouldn't matter overly 
much) by checking `!containsNonAsciiOrNull()`?


================
Comment at: llvm/lib/Support/Unicode.cpp:300
 ///   * 1 for all remaining characters.
-static inline int charWidth(int UCS)
-{
+static inline int charWidth(int UCS) {
   if (!isPrintable(UCS))
----------------
Unrelated formatting change?


================
Comment at: llvm/lib/Support/Unicode.cpp:377
 } // namespace llvm
-
----------------
Unrelated formatting change?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108469/new/

https://reviews.llvm.org/D108469

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to