[PATCH] D45532: [StaticAnalyzer] Checker to find uninitialized fields after a constructor call

Umann Kristóf via Phabricator via cfe-commits Tue, 15 May 2018 08:13:44 -0700

Szelethus added a comment.

I'm afraid it'll be even more time before I post a new diff. There are some 
things that `ProgramState` is lacking, so I'll get that fixed first in a 
different pull request first.




================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:258-260
+    Report->addNote(NoteBuf,
+                    PathDiagnosticLocation::create(FieldChain.getEndOfChain(),
+                                                   
Context.getSourceManager()));
----------------
NoQ wrote:
> NoQ wrote:
> > Szelethus wrote:
> > > NoQ wrote:
> > > > Aha, ok, got it, so you're putting the warning at the end of the 
> > > > constructor and squeezing all uninitialized fields into a single 
> > > > warning by presenting them as notes.
> > > > 
> > > > Because for now notes aren't supported everywhere (and aren't always 
> > > > supported really well), i guess it'd be necessary to at least provide 
> > > > an option to make note-less reports before the checker is enabled by 
> > > > default everywhere. In this case notes contain critical pieces of 
> > > > information and as such cannot be really discarded.
> > > > 
> > > > I'm not sure that notes are providing a better user experience here 
> > > > than simply saying that "field x.y->z is never initialized". With 
> > > > notes, the user will have to scroll around (at least in scan-build) to 
> > > > find which field is uninitialized.
> > > > 
> > > > Notes will probably also not decrease the number of reports too much 
> > > > because in most cases there will be just one uninitialized field. 
> > > > Though i agree that when there is more than one report, the user will 
> > > > be likely to deal with all fields at once rather than separately.
> > > > 
> > > > So it's a bit wonky.
> > > > 
> > > > We might want to enable one mode or another in the checker depending on 
> > > > the analyzer output flag. Probably in the driver 
> > > > (`RenderAnalyzerOptions()`).
> > > > 
> > > > It'd be a good idea to land the checker into alpha first and fix this 
> > > > in a follow-up patch because this review is already overweight.
> > > > [...]i guess it'd be necessary to at least provide an option to make 
> > > > note-less reports before the checker is enabled by default everywhere. 
> > > > In this case notes contain critical pieces of information and as such 
> > > > cannot be really discarded.
> > > This can already be achieved with `-analyzer-config 
> > > notes-as-events=true`. There no good reason however not to make an 
> > > additional flag that would emit warnings instead of notes.
> > > > I'm not sure that notes are providing a better user experience here 
> > > > than simply saying that "field x.y->z is never initialized". With 
> > > > notes, the user will have to scroll around (at least in scan-build) to 
> > > > find which field is uninitialized.
> > > This checker had a previous implementation that emitted a warning for 
> > > each uninitialized field, instead of squeezing them in a single warning 
> > > per constructor call. One of the major reasons behind rewriting it was 
> > > that it emitted so many of them that it was difficult to differentiate 
> > > which warning belonged to which constructor call. Also, in the case of 
> > > the LLVM/Clang project, the warnings from this checker alone would be in 
> > > the thousands.
> > > > Notes will probably also not decrease the number of reports too much 
> > > > because in most cases there will be just one uninitialized field.
> > > While this is true to some extent, it's not uncommon that a single 
> > > constructor call leaves 7 uninitialized -- in the LLVM/Clang project I 
> > > found one that had over 30!
> > > Since its practically impossible to determine whether this was a 
> > > performance enhancement or just poor programming, the few cases where it 
> > > is intentional, an object would emit many more warnings would make  make 
> > > majority of the findings (where an object truly had only 1 or 2 fields 
> > > uninit) a lot harder to spot in my opinion.
> > > 
> > > In general, I think one warning per constructor call makes the best user 
> > > experience, but a temporary solution should be implemented until there's 
> > > better support for notes.
> > > 
> > > I agree, this should be a topic of a follow-up patch.
> > > 
> > > This can already be achieved with `-analyzer-config notes-as-events=true`.
> > 
> > Yep. But the resulting user experience seems pretty bad to me. It was a 
> > good quick proof-of-concept trick to test things, but the fact that notes 
> > aren't path pieces is way too apparent in case of this checker. So i guess 
> > this flag was a bad idea.
> > 
> > > Also, in the case of the LLVM/Clang project, the warnings from this 
> > > checker alone would be in the thousands.
> > 
> > This makes me a bit worried, i wonder what's the reason why does the 
> > checker shout so loudly on a codebase that doesn't seem to have actual 
> > uninitialized value bugs all over the place.
> > 
> > Are any of these duplicates found in the same constructor for the same 
> > field in different translation units? Suppose we discard the duplicates (if 
> > any) and warn exactly once (across the whole LLVM) for every uninitialized 
> > field (which is probably more than once per constructor). Then I wonder:
> > 1. How many uninitialize fields would we be able to find this way?
> > 2. How many of such fields were intentionally left uninitialized?
> > 3. How many were found by deep inspection of the heap through field chains 
> > involving pointer dereference "links"?
> (when i'm asking this sort of stuff, i don't mean you should look through 
> thousands of positives, a uniform random sample of ~50 should be just fine)
> (but we really do need this info before enabling stuff by default)
>>This can already be achieved with -analyzer-config notes-as-events=true.
>Yep. But the resulting user experience seems pretty bad to me. It was a good 
>quick proof-of-concept trick to test things, but the fact that notes aren't 
>path pieces is way too apparent in case of this checker. So i guess this flag 
>was a bad idea.

I totally agree, I meant that a note-less solution should only be a (arguably 
better) workaround just as `notes-as-events=true`.

>Are any of these duplicates found in the same constructor for the same field 
>in different translation units? Suppose we discard the duplicates (if any) and 
>warn exactly once (across the whole LLVM) for every uninitialized field (which 
>is probably more than once per constructor).

Sure, having the same field field reported from the same constructor call in 
different TUs happens quite a lot, but they can easily be uniqued.

In the checkers current state, meaning that one warning per ctor call, the 
LLVM/Clang project in pedantic mode resulted in **409 non-unique warnings**, 
and using CodeChecker I found that **181 of these is unique**.

>How many uninitialize fields would we be able to find this way?
In its current state, in pedantic mode, after uniqueing **~581**. Now that's 
quite a bit off from what I remembered //"the warnings from this checker alone 
would be in the thousands"//, but having one warning per uninit field would 
still result in a ~320% increase in reports.
>How many of such fields were intentionally left uninitialized?
(I checked around 95 reports a couple weeks back before uploading this diff)
**Almost all of them.** Note that this is a very performance-critical project. 
From what I saw, many times several fields are left uninitialized, but these 
fields are inaccessible due to a `kind` field being asserted at their getter 
functions.
Also, I found cases where the constructor wasn't defaulted with `= default;`.

In fact, I struggled to find a case where a field was 100% left out by 
accident. Maybe in `llvm/lib/IR/ConstantsContext.h`, calling 
`ConstantExprKeyType(ArrayRef<Constant *> Operands, const ConstantExpr *CE)`.

This is not the case however in other projects I checked. In Xerces for example 
every find was a true positive.

>How many were found by deep inspection of the heap through field chains 
>involving pointer dereference "links"?
This is somewhat harder to measure. The best I could come up with is uniqueing 
the fieldchains from the plist files with lexicographical sorting. Your 
question can be answered by measuring  the length of the fieldchains. I found
* 3 fieldchains with the length of 5
* 18 fieldchains with the length of 4
* 62 fieldchains with the lenfth of 3
* 108 fieldchains with the length of 2
* 137 fieldchains with the length of 1
(that is a total of 328 unique fieldchains lexicographically)
Because heap objects are not modeled, only 3 of the 328 fieldchains contains 
pointer dereferences. However, reference objects are very common from what I 
saw, but sadly I can't give you an exact number on those.

I hope these answers were satisfying, but I'll follow up with more information 
if needed.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:359-372
+  // Checking bases.
+  const auto *CXXRD = dyn_cast<CXXRecordDecl>(RD);
+  if (!CXXRD)
+    return ContainsUninitField;
+
+  for (const CXXBaseSpecifier BaseSpec : CXXRD->bases()) {
+    const auto *Base = BaseSpec.getType()->getAsCXXRecordDecl();
----------------
NoQ wrote:
> Are there many warnings that will be caused by this but won't be caused by 
> conducting the check for the constructor-within-constructor that's currently 
> disabled? Not sure - is it even syntactically possible to not initialize the 
> base class?
I'm not a 100% sure what you mean. Can you clarify?
>Not sure - is it even syntactically possible to not initialize the base class?
If I understand the question correctly, no, as far as I know.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:430
+    if (T->isUnionType()) {
+      // TODO: does control ever reach here?
+      if (isUnionUninit(R)) {
----------------
NoQ wrote:
> Add `llvm_unreachable("")` to find out :)
It does! :)


https://reviews.llvm.org/D45532



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D45532: [StaticAnalyzer] Checker to find uninitialized fields after a constructor call

Reply via email to