aaron.ballman added inline comments.
================ Comment at: clang/include/clang/Basic/DiagnosticLexKinds.td:116-117 "source file is not valid UTF-8">; +def warn_invalid_utf8_in_comment : Warning< + "invalid UTF-8 in comment">, InGroup<DiagGroup<"invalid-utf8">>, DefaultIgnore; def err_character_not_allowed : Error< ---------------- What would it take to enable the diagnostic by default? We don't typically add off-by-default diagnostics because there's evidence that not many folks enable them. Alternatively, would this make sense as a pedantic warning? It's not really an extension for us to accept this stuff, but it seems like we can still pedantically diagnose the code? ================ Comment at: clang/lib/Lex/Lexer.cpp:2399 + getSourceLocation(CurPtr)); + bool UnicodeDecodeFailed = false; + ---------------- It looks like this can move into the `while` loop and we can remove some `= false` from within the loop? ================ Comment at: clang/lib/Lex/Lexer.cpp:2407 + // diagnostic only once per sequence that cannot be decoded. + while ((!WarnOnInvalidUtf8 || isASCII(C)) && C != 0 && // Potentially EOF. + C != '\n' && C != '\r') { // Newline or DOS-style newline. ---------------- Do you have some idea of performance cost for enabling the warning with a large TU? Are we talking .5%? 1%? 5%? noise? ================ Comment at: clang/lib/Lex/Lexer.cpp:2417 + if (Length == 0) { + if (!UnicodeDecodeFailed) { + Diag(CurPtr, diag::warn_invalid_utf8_in_comment); ---------------- aaron.ballman wrote: > So if unicode decoding didn't fail... diagnose? :-) The naming here seems unfortunate. ================ Comment at: clang/lib/Lex/Lexer.cpp:2417-2419 + if (!UnicodeDecodeFailed) { + Diag(CurPtr, diag::warn_invalid_utf8_in_comment); + } ---------------- ================ Comment at: clang/lib/Lex/Lexer.cpp:2698 + getSourceLocation(CurPtr)); + bool UnicodeDecodeFailed = false; + ---------------- I think this can move into the loop as well. ================ Comment at: clang/lib/Lex/Lexer.cpp:2764-2766 + if (!UnicodeDecodeFailed) { + Diag(CurPtr, diag::warn_invalid_utf8_in_comment); + } ---------------- Same concerns about naming here as above. ================ Comment at: llvm/lib/Support/ConvertUTF.cpp:426-429 + if (length > sourceEnd - source) { + return 0; + } + return isLegalUTF8(source, length) ? length : 0; ---------------- Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D128059/new/ https://reviews.llvm.org/D128059 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits