bruno added a comment. In http://reviews.llvm.org/D18296#384766, @rsmith wrote:
> This patch builds a length-1 `ConversionSpecifier` but includes the complete > code point in the length of the overall format specifier, which is > inconsistent. Please either treat the trailing bytes as part of the > `ConversionSpecifier` or revert the changes to `ParsePrintfSpecifier` and > handle this entirely within `HandleInvalidConversionSpecifier`. Ok, gonna handle this entirely within `HandleInvalidConversionSpecifier` then. > Does the same problem exist when parsing `scanf` specifiers? Yes, I missed that, will update accordingly. ================ Comment at: lib/Analysis/PrintfFormatString.cpp:324 @@ +323,3 @@ + // UTF-8 sequence. If that's the case, adjust the length accordingly. + if (Start + 1 < I && !llvm::sys::locale::isPrint(FirstByte) && + isLegalUTF8String(&SB, SE)) ---------------- rsmith wrote: > The interpretation of a format string by `printf` should not depend on the > locale, so our parsing of a format string should not either. llvm::sys::locale::isPrint does not actually do any locale specific check (maybe it should be moved elsewhere for better consitency?): bool isPrint(int UCS) { #if LLVM_ON_WIN32 // Restrict characters that we'll try to print to the lower part of ASCII // except for the control characters (0x20 - 0x7E). In general one can not // reliably output code points U+0080 and higher using narrow character C/C++ // output functions in Windows, because the meaning of the upper 128 codes is // determined by the active code page in the console. return ' ' <= UCS && UCS <= '~'; #else return llvm::sys::unicode::isPrintable(UCS); #endif } This logic is needed anyway though. Suggestions? http://reviews.llvm.org/D18296 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits