NoQ added a comment.

Had a look at the actual code, came up with some comments, most are fairly 
minor, so good job! You did a good job explaining the overall idea, but a lot 
of specifics were difficult to understand, so i made a few comments on how to 
make the code a bit easier to read.

> I know, and I tried to look for ways to split this checker into smaller 
> logical chunks, but I didn't find a satisfying solution.

What i would have done:

1. A checker that straightforwardly iterates through immediate fields of the 
current object and reports uninitialized fields.
2. Add base classes support.
3. Heuristic for skipping sub-constructors could use a separate review.
4. Start skipping arrays and unions.
5. Add simple pointer dereference support - introduce chains of fields.
6. More heuristics - the one for symbolic regions, the one for system header 
fields, etc.

etc.

I'm asking for this because it's, like, actually difficult and painful to 
understand formal information in large chunks. One-solution-at-a-time would 
have been so much easier. It is really a bad idea to start by presenting a 
working solution; the great idea is to present a small non-working solution 
with a reasonable idea behind it, and then extend it "in place" as much as it 
seems necessary. It is very comfortable when parts of the patch with different 
"status" (main ideas, corner-case fixes, heuristics, refactoring) stay in 
separate patches. Alpha checkers are essentially branches: because our svn 
doesn't have branches, we have alpha checkers and off-by-default flags. Even 
before step 1, you should be totally fine with committing an empty checker with 
a single empty callback and a header comment - even on such "step 0." we would 
be able to discuss if your approach to the checker seems right or might have 
inevitable issues, and it could have saved a lot of time.

Essentially every part of the solution that you implement, if you think it 
might deserve a separate discussion, also deserves a separate patch. It is 
reasonable to split the work into logical chunks before you start it. It's very 
unlikely that you have a single idea that takes 500 non-trivial lines of code 
to express. Splitting up ideas so that they were easy to grasp is an invaluable 
effort. I had very positive experience with that recently, both as a coder and 
as a reviewer, so i encourage that a lot.



================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:47
+class FieldChainInfo {
+  using FieldChain = llvm::SmallVector<const FieldRegion *, 10>;
+
----------------
At a glance it looks like a great use case for an immutable list. It's easy to 
use the immutable list as a stack, and it's also O(1) to copy/move and take 
snapshots of it.

Note that you copy 10 pointers every time you copy //or move// a `SmallVector`. 
Small vectors don't make much sense as fields of actively used data structures 
- their main purpose is to live on the stack as local variables.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:51-53
+  /// If this is a fieldchain whose last element is an uninitialized region of 
a
+  /// pointer type, this variable will store whether the pointer itself or the
+  /// pointee is uninitialized.
----------------
Please move this doc to the getter/setter function. That's where people expect 
to find it when they're reading code.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:58-59
+  FieldChainInfo() = default;
+  /// Delegates to the copy ctor and adds FR to Chain. Note that Chain cannot 
be
+  /// modified in a FieldChainInfo object after its construction.
+  FieldChainInfo(const FieldChainInfo &Other, const FieldRegion *FR);
----------------
Do we really need the other field (`IsDereferenced`) to be mutable? Or maybe we 
can keep the whole object immutable? I.e., instead of `dereference()` we'll get 
a constructor that constructs a `FieldChainInfo` with the same chain but with 
the dereferenced flag set.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:86-89
+  StoreManager &StoreMgr;
+  MemRegionManager &MrMgr;
+  SourceManager &SrcMgr;
+  Store S;
----------------
All of these can be replaced with a single `ProgramStateRef` object.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:93
+  const bool IsPedantic;
+  bool IsChecked = false;
+  bool IsAnyFieldInitialized = false;
----------------
This field is worth documenting.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:156-157
+  //
+  // We'll traverse each node of the above graph with the appropiate one of
+  // these methods:
+  bool isUnionUninit(const TypedValueRegion *R);
----------------
Please explain what these methods do in a more direct manner. Something like 
"This method returns true if an uninitialized field was found within the 
region. It should only be used with regions of union-type objects".


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:170-171
+
+/// Returns the with the object that was constructed by CtorDecl, or None if
+/// that isn't possible.
+Optional<nonloc::LazyCompoundVal>
----------------
Something's missing before `with`.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:176-177
+/// Checks whether the constructor under checking is called by another
+/// constructor. This avoids essentially the same error being reported multiple
+/// times.
+bool isCalledByConstructor(const CheckerContext &Context);
----------------
The second sentence would look great at the call site.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:181-182
+/// Returns whether FD can be (transitively) dereferenced to a void pointer 
type
+/// (void*, void**, ...). The type of the region behind a void pointer isn't
+/// known, and thus FD can not be analyzed.
+bool isVoidPointer(const FieldDecl *FD);
----------------
Type of an object pointed to by a void pointer is something that our 
path-sensitive engine is sometimes able to provide. It is not uncommon that a 
void pointer points to a concrete variable on the current path, and we may know 
it in the analyzer. I'm not sure this sort of check is necessary (i.e. require 
more information).

You may also want to have a look at `getDynamicTypeInfo()` which is sometimes 
capable of retrieving the dynamic type of the object pointed to by a pointer or 
a reference - i.e. even if it's different from the pointee type of the pointer.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:185
+
+} // namespace
+
----------------
We're traditionally writing "end anonymous namespace" or "end of anonymous 
namespace".


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:223-225
+  ExplodedNode *Node = Context.generateNonFatalErrorNode(Context.getState());
+  if (!Node)
+    return;
----------------
I suspect that a fatal error is better here. We don't want the user to receive 
duplicate report from other checkers that catch uninitialized values; just one 
report is enough.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:233-236
+  llvm::Twine WarningMsg = llvm::Twine(UninitFields.size()) +
+                           llvm::Twine(" uninitialized field") +
+                           llvm::Twine(UninitFields.size() == 1 ? "" : "s") +
+                           llvm::Twine(" at the end of the constructor call");
----------------
I believe that you only need the first `Twine`; then all `operator+` calls will 
be resolved to `operator+(Twine, ...)` automatically. Or just use a string and 
a stream like pretty much all other checkers.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:244
+    SmallString<200> FieldChainBuf;
+    FieldChain.toString(FieldChainBuf);
+    SmallString<200> NoteBuf;
----------------
This is the third approach to concatenating strings in the last 10 lines of 
code. I think you should make a method `FieldChainInfo::print(llvm::raw_ostream 
&)` and use streams everywhere.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:258-260
+    Report->addNote(NoteBuf,
+                    PathDiagnosticLocation::create(FieldChain.getEndOfChain(),
+                                                   
Context.getSourceManager()));
----------------
Aha, ok, got it, so you're putting the warning at the end of the constructor 
and squeezing all uninitialized fields into a single warning by presenting them 
as notes.

Because for now notes aren't supported everywhere (and aren't always supported 
really well), i guess it'd be necessary to at least provide an option to make 
note-less reports before the checker is enabled by default everywhere. In this 
case notes contain critical pieces of information and as such cannot be really 
discarded.

I'm not sure that notes are providing a better user experience here than simply 
saying that "field x.y->z is never initialized". With notes, the user will have 
to scroll around (at least in scan-build) to find which field is uninitialized.

Notes will probably also not decrease the number of reports too much because in 
most cases there will be just one uninitialized field. Though i agree that when 
there is more than one report, the user will be likely to deal with all fields 
at once rather than separately.

So it's a bit wonky.

We might want to enable one mode or another in the checker depending on the 
analyzer output flag. Probably in the driver (`RenderAnalyzerOptions()`).

It'd be a good idea to land the checker into alpha first and fix this in a 
follow-up patch because this review is already overweight.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:292
+  if (!IsChecked)
+    hasUnintializedFields();
+  return UninitFields;
----------------
Why is checking done lazily in the first place? Are there many situations in 
which we want to construct the object but never do the actual scan?

If it was actually necessary, i would have split out the side effect of this 
getter into a separate private method.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:346
+    // If this is a pointer or reference type.
+    if (V.getAs<Loc>()) {
+      if (isPointerOrReferenceUninit(FR, {LocalChain, FR}))
----------------
Please check the type explicitly, and then assert that the value is a `Loc`. 
There are a lot of nuances that you don't want to be dealing with here (eg. 
`ObjCObjectPointerType`, which can be present in C++ as well because 
Objective-C++ is a thing, is not a `PointerType` or a `ReferenceType`, but it 
is modeled with a `Loc`).


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:352
+
+    // At this point the field is a fundamental type.
+    if (isFundamentalUninit(V)) {
----------------
Please formalize what you mean by a fundamental type and assert that. You 
probably mean something like "an integer or a floating-point number or maybe an 
enum or, well, bool".

I suspect that many cases here are missed (complex types, vector types, ...).


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:359-372
+  // Checking bases.
+  const auto *CXXRD = dyn_cast<CXXRecordDecl>(RD);
+  if (!CXXRD)
+    return ContainsUninitField;
+
+  for (const CXXBaseSpecifier BaseSpec : CXXRD->bases()) {
+    const auto *Base = BaseSpec.getType()->getAsCXXRecordDecl();
----------------
Are there many warnings that will be caused by this but won't be caused by 
conducting the check for the constructor-within-constructor that's currently 
disabled? Not sure - is it even syntactically possible to not initialize the 
base class?


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:384
+bool FindUninitializedFields::isPointerOrReferenceUninit(
+    const FieldRegion *FR, FieldChainInfo LocalChain) {
+
----------------
Please assert here that `FR` is a pointer/reference-type field.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:386
+
+  SVal V = StoreMgr.getBinding(S, loc::MemRegionVal(FR));
+
----------------
As i mentioned above, you can simply store a `ProgramStateRef` within 
`FindUninitializedFields`. In this case this code will be simplified to 
`State->getSVal(FR)`.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:405-406
+  SVal DerefdV = StoreMgr.getBinding(S, V.castAs<Loc>());
+  while (Optional<Loc> L = DerefdV.getAs<Loc>())
+    DerefdV = StoreMgr.getBinding(S, *L);
+
----------------
Is this loop guaranteed to terminate? Is something like `void *v; v = &v;` 
possible?

I think this loop should unwrap the AST type on every iteration and dereference 
the pointer accordingly.

In general, i believe that every symbolic pointer dereference should be made 
with awareness of the type of the object we're trying to retrieve. Which is an 
optional argument for `State->getSVal(R, ...)`, but it seems that it should be 
made mandatory because in a lot of cases omitting it causes problems.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:417
+    // constructors for list-like classes are checked without being called, and
+    // the CSA will conjure a symbolic region for Node *next; or similar code
+    // snippets.
----------------
CSA is not a commonly used acronym around the codebase.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:417
+    // constructors for list-like classes are checked without being called, and
+    // the CSA will conjure a symbolic region for Node *next; or similar code
+    // snippets.
----------------
NoQ wrote:
> CSA is not a commonly used acronym around the codebase.
I'd like to change the word "conjure" to something else because in most cases 
it won't be an actual `SymbolConjured`, but it'd be a `SymbolRegionValue` or 
`SymbolDerived`. Conjuring occurs in different circumstances.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:430
+    if (T->isUnionType()) {
+      // TODO: does control ever reach here?
+      if (isUnionUninit(R)) {
----------------
Add `llvm_unreachable("")` to find out :)


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:432
+      if (isUnionUninit(R)) {
+        LocalChain.dereference();
+        return addFieldToUninits(std::move(LocalChain));
----------------
Needs a comment on why the `IsDereferenced` flag on this path.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:480
+
+// TODO: This function constructs an incorrect fieldchain string in the
+// following case:
----------------
Can we at least detect these situations and suppress the warning? It should be 
relatively easy for lambdas (captured variable mapping should be stored 
somewhere, as far as i remember), not sure how easy it'd be for double 
inheritance (there are fancy methods for scanning this stuff, but that's 
equivalent to actually implementing the correct warning message).

Weird warning messages are pretty scary to show to the users.

It'd be a good idea to land the checker into alpha first and fix this in a 
follow-up patch because this review is already overweight.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:559-560
+  while (LC) {
+    if (isa<CXXConstructorDecl>(LC->getDecl()))
+      return true;
+
----------------
This doesn't check that the constructor on the parent stack frame is anyhow 
related to the current constructor, so it may introduce false negatives. For 
instance, a class may lazy-initialize a singleton but never store a reference 
to it within itself (because, well, it's a singleton, you can always obtain the 
reference to it). In this case we'll never find the bug in the constructor of 
the singleton.


================
Comment at: test/Analysis/cxx-uninitialized-object.cpp:931
+// FIXME: As of writing this checker, there is no good support for union types
+// in the CSA. Here is non-exhaustive list of cases.
+// Note that the rules for unions are different in C++ and C.
----------------
CSA is not a commonly used acronym around the codebase.


https://reviews.llvm.org/D45532



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to