Re: [PATCH] D18296: [Sema] Handle UTF-8 invalid format string specifiers

Bruno Cardoso Lopes via cfe-commits Mon, 28 Mar 2016 13:35:51 -0700

bruno added a comment.

In http://reviews.llvm.org/D18296#384766, @rsmith wrote:


> This patch builds a length-1 `ConversionSpecifier` but includes the complete 
> code point in the length of the overall format specifier, which is 
> inconsistent. Please either treat the trailing bytes as part of the 
> `ConversionSpecifier` or revert the changes to `ParsePrintfSpecifier` and 
> handle this entirely within `HandleInvalidConversionSpecifier`.


Ok, gonna handle this entirely within `HandleInvalidConversionSpecifier` then.

> Does the same problem exist when parsing `scanf` specifiers?


Yes, I missed that, will update accordingly.


================
Comment at: lib/Analysis/PrintfFormatString.cpp:324
@@ +323,3 @@
+    // UTF-8 sequence. If that's the case, adjust the length accordingly.
+    if (Start + 1 < I && !llvm::sys::locale::isPrint(FirstByte) &&
+        isLegalUTF8String(&SB, SE))
----------------
rsmith wrote:
> The interpretation of a format string by `printf` should not depend on the 
> locale, so our parsing of a format string should not either.
llvm::sys::locale::isPrint does not actually do any locale specific check 
(maybe it should be moved elsewhere for better consitency?):

  bool isPrint(int UCS) {
  #if LLVM_ON_WIN32
    // Restrict characters that we'll try to print to the lower part of ASCII
    // except for the control characters (0x20 - 0x7E). In general one can not
    // reliably output code points U+0080 and higher using narrow character 
C/C++
    // output functions in Windows, because the meaning of the upper 128 codes 
is
    // determined by the active code page in the console.
    return ' ' <= UCS && UCS <= '~';
  #else
    return llvm::sys::unicode::isPrintable(UCS);
  #endif
  }

This logic is needed anyway though. Suggestions?


http://reviews.llvm.org/D18296



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Re: [PATCH] D18296: [Sema] Handle UTF-8 invalid format string specifiers

Reply via email to