fcloutier added a comment.

Thanks for looking, Aaron. You're right that the main utility of the 
aggregation of format warnings is to extend C's type checking because there is 
no other good way, or good place, to do it. I have built hundreds of millions 
of shipping lines of C, C++ and Objective-C, and this change seems like it 
would be an effective fix in several places where we don't currently have 
anywhere else to go.

For variadic templates, you're right that at some final instantiation, the 
compiler will have all the format argument types in hand. What gets lost in 
piping and substitution is actually the format string that Clang must 
type-check against. You can see this in action in the LLVM code base, actually: 
if we turn on -Wformat-nonliteral for files including 
llvm/include/Support/Format.h, the warning will trigger for users of the 
`llvm::format` function. This is because the format string is stored in 
`format_object_base::Fmt` instead of being directly forwarded, and sprinkling 
`constexpr` in strategic places won't resolve this issue because SemaChecking 
has its own custom expression evaluator. Swapping it out for the more common 
ExprConstant stuff is probably not impossible, but it's difficult because 
SemaChecking supports its own kind of symbolism. For instance, it's OK to use a 
format string parameter as the format string argument of another function that 
wants one, and other attributes like `format_arg` can tell you to assume the 
same specifiers as you get from another expression. So, we could fix this, but 
it would be more work, and the same purpose is served by allowing fixed 
arguments to participate in the `format` attribute checking. Verifying that 
users of the LLVM `llvm::format` function are doing the right thing is a better 
experience than letting this bubble up from however many levels of template 
instantiation there is.

Type checking may well actually need to be tested better for the case where 
variadic argument promotions don't happen, but there's a fairly finite number 
of ways it could go wrong, and I still think that's the easier way to go.

I don't think that we can easily distinguish between parameters created by 
variadic templates and fixed parameters from a function without variadic 
templates. I also think that most people who write functions like llvm::format 
aggressively do not want to be in charge of type-checking the format string 
themselves, and would much rather defer to Clang's existing type checker. If 
Clang allows the format attribute to type-check parameters created by variadic 
templates, it follows that the path of least resistance is to allow it on 
functions with fixed arguments.

With that said, there are still use cases for allowing format strings in 
functions with fixed arguments. (Interestingly, it would not be possible to mix 
fixed format arguments with variadic format arguments in a C function, so maybe 
we can prevent that altogether, but it does not remove from the general 
usefulness of the feature). You'll have to take my word for this, but it's one 
of the handful of blockers that we have for very broad adoption of 
-Wformat-nonliteral. The story usually goes like this: there's some function 
`NSString *foo(NSString *fmt, NSMyFoo *obj1, NSMyBar *obj2)` that needs to do 
something with `obj1` and `obj2` before it decides whether it actually wants to 
print them, return something unrelated altogether, throw an exception, etc. 
General limitations of the C language make it difficult to untangle the print 
parts from the logic parts for reasons which, to me, align rather closely to 
compiler capriciousness given how close it is to supporting the feature. This 
is more common in Objective-C code bases as %@ is a universal format specifier 
for all object types.

Another way we could approach this problem is to have a new attribute to say 
that a format argument must have a string that conforms to another constant 
format string. For instance, I think that `NSString *foo(NSString *fmt, NSMyFoo 
*obj1, NSMyBar *obj2) __attribute__((format_like(1, @"hello %@ %@!")))`, by the 
compiler checks that `fmt` has specifiers equivalent to %@ %@, would give us 
roughly the same safety improvements. It would allow for even more fun things, 
like passing a format string to a function that will then itself supply 
arguments to a formatting function.

What do you think?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112579/new/

https://reviews.llvm.org/D112579

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to