NoQ created this revision.
NoQ added reviewers: dcoughlin, xazax.hun, a.sidorin, george.karpenkov, szepet, 
rnkovacs.
Herald added subscribers: cfe-commits, baloghadamsoftware.

As the old assertion says,

> "Thou shalt not initialize the same memory twice."

Namely, the `State->bindDefault()`/`StoreManager.BindDefault()` API has the 
purpose of overriding the initial value with which memory is initialized. For 
instance, a checker can express that loading from freshly malloc()ed memory 
would yield Undefined, while loading from freshly calloc()ed memory would yield 
zero, even though loading from a normal symbolic region would yield a 
region-value symbol. `BindDefault` never was meant for binding arbitrary 
default values to arbitrary locations (though it was never doxygen'ed 
anywhere). For that reason it has the aforementioned assertion.

The assertion is actually a bit more relaxed than that; it simply says that the 
new default binding should never overwrite an existing binding key, which means 
that not only the base region but also the offset should coincide in order to 
trigger the assertion.

Other sorts of default store bindings are created by different APIs. Namely, 
there's also the invalidation case, which consists in default-binding a 
conjured symbol, and it is performed by 
`State->invalidateRegions()`/`StoreManager.invalidateRegions()` which never 
calls `BindDefault` but instead manipulates bindings directly.

Now, when it comes to C++ zeroing constructors (i.e. non-user-supplied 
constructors that boil down to zero initialization of the object), a lot of 
weird things can happen, especially when such constructors are applied to 
fields or base objects, especially with multiple inheritance, especially with 
default inheritance from empty objects (objects with no fields, eg. a-la java 
interfaces). The same base object would undergo multiple initializations for 
every zeroing constructor of a field or a base. And due to empty fields or 
bases, it may happen that such bindings will have the exact same offset. It may 
also happen that the new default binding would conflict an old direct binding, 
as i demonstrated on one of the newly added tests.

It seems that C++ zero-initialization is quite different from normal 
`BindDefault`'s idea of setting the initial/default value. Therefore i propose 
to split up the `BindDefault` API into two parts: `BindDefaultInitial()` for 
setting initial/default values and `BindDefaultZero()` for performing C++-like 
zero initialization specifically. This would leave us with 3 different APIs for 
manipulating default bindings, but i guess it's not that bad because these 
different APIs are quite good at expressing the coder's intent while hiding 
internals of the `RegionStore`, and their implementations are significantly 
different.

----

Now, recall that there was a previous attempt to relax that assertion in 
https://reviews.llvm.org/rC190530, for the same reason. The separation of APIs 
proposed here would allow us to restore the historical assertion in its 
original strength, while accepting that the assertion doesn't cover C++ 
zero-initialization. I'm not seeing a good way to cover the zero-initialization 
case with that, and i'm not convinced that it's necessary.

The interesting part here is, however, the special-case behavior that was 
chosen on the path on which the assertion did not hold. It was pointed out by 
Jordan that if a zeroing constructor follows a conservatively evaluated 
empty-base constructor, it is more correct to preserve the default conjured 
symbol produced by the empty-base constructor than to replace it with a zero 
default binding that would completely overwrite the conjured symbol, since //we 
don't model store binding extents//. This is more correct because 
zero-initializer should only cover its respective sub-object, but other parts 
of the big object must remain invalidated.

In the newly added tests i point out that keeping the original conjured symbol 
in that case would also not be fully safe. The actually-safe behavior would be 
to invalidate the object again, because it's no longer the same unknown value 
that we previously had. So i'm adding a bunch of tests in which we take various 
field values in different weird moments of time and see if we aren't 
accidentally assuming that they're the same known or unknown values as the ones 
taken in different moments of time.

These tests now also demonstrate another bug that would also make our store 
bindings incorrectly, namely by looking at `ASTRecordLayout` it is not always 
apparent what part of the object is going to be overwritten by making a binding 
to a specific sub-region. See the new architecture-specific tests in which 
field `z` misbehaves.

Long story short, `RegionStore` is modeling zero initializations imperfectly 
for at least two reasons.

We could work around that by invalidating object contents every time we detect 
something fishy. The only reasonable detection i can come up with would be to 
detect if we have a coinciding binding (i.e. what the assertion was previously 
protecting us from), eg.:

  if (B.getDefaultBinding(R) || B.getDirectBinding(R))
    V = UnknownVal();

This detection is imperfect - there are tests that would still fail. In 
particular it won't fix the `ASTRecordLayout` problem. And i also suspect that 
it really destroys a lot more good data than bad data. So i'll be experimenting 
with that now. In the current version of the patch i remove the special case 
handling and simply bind the default value blindly even if i know that 
`RegionStore` can't handle it precisely.


Repository:
  rC Clang

https://reviews.llvm.org/D46368

Files:
  include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
  include/clang/StaticAnalyzer/Core/PathSensitive/Store.h
  lib/StaticAnalyzer/Checkers/MallocChecker.cpp
  lib/StaticAnalyzer/Core/ExprEngine.cpp
  lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
  lib/StaticAnalyzer/Core/ProgramState.cpp
  lib/StaticAnalyzer/Core/RegionStore.cpp
  lib/StaticAnalyzer/Core/Store.cpp
  test/Analysis/ctor.mm

Index: test/Analysis/ctor.mm
===================================================================
--- test/Analysis/ctor.mm
+++ test/Analysis/ctor.mm
@@ -1,5 +1,7 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -fobjc-arc -analyzer-config c++-inlining=constructors -Wno-null-dereference -std=c++11 -verify %s
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -fobjc-arc -analyzer-config c++-inlining=constructors -Wno-null-dereference -std=c++11 -verify -DTEST_INLINABLE_ALLOCATORS %s
+// RUN: %clang_analyze_cc1 -triple i386-apple-darwin10 -DI386 -analyzer-checker=core,debug.ExprInspection -fobjc-arc -analyzer-config c++-inlining=constructors -Wno-null-dereference -std=c++11 -verify %s
+// RUN: %clang_analyze_cc1 -triple i386-apple-darwin10 -DI386 -analyzer-checker=core,debug.ExprInspection -fobjc-arc -analyzer-config c++-inlining=constructors -Wno-null-dereference -std=c++11 -verify -DTEST_INLINABLE_ALLOCATORS %s
+// RUN: %clang_analyze_cc1 -triple x86_64-apple-darwin12 -analyzer-checker=core,debug.ExprInspection -fobjc-arc -analyzer-config c++-inlining=constructors -Wno-null-dereference -std=c++11 -verify %s
+// RUN: %clang_analyze_cc1 -triple x86_64-apple-darwin12 -analyzer-checker=core,debug.ExprInspection -fobjc-arc -analyzer-config c++-inlining=constructors -Wno-null-dereference -std=c++11 -verify -DTEST_INLINABLE_ALLOCATORS %s
 
 #include "Inputs/system-header-simulator-cxx.h"
 
@@ -638,21 +640,68 @@
 
   class Empty {
   public:
-    Empty();
+    static int glob;
+    Empty(); // No body.
+    Empty(int x); // Body below.
   };
 
   class PairContainer : public Empty {
-    raw_pair p;
   public:
+    raw_pair p;
+    int q;
     PairContainer() : Empty(), p() {
       // This previously caused a crash because the empty base class looked
       // like an initialization of 'p'.
     }
     PairContainer(int) : Empty(), p() {
       // Test inlining something else here.
     }
+    PairContainer(double): Empty(1), p() {
+      clang_analyzer_eval(p.p1 == 0); // expected-warning{{TRUE}}
+      clang_analyzer_eval(p.p2 == 0); // expected-warning{{TRUE}}
+
+      clang_analyzer_eval(q == 1); // expected-warning{{TRUE}}
+
+      // This one's indeed UNKNOWN. Definitely not TRUE.
+      clang_analyzer_eval(p.p2 == glob); // expected-warning{{UNKNOWN}}
+    }
   };
 
+  Empty::Empty(int x) {
+    static_cast<PairContainer *>(this)->p.p1 = x;
+    static_cast<PairContainer *>(this)->q = x;
+    // Our static member will store the old garbage values of fields that aren't
+    // yet initialized. It's not certainly garbage though (i.e. the constructor
+    // could have been called on an initialized piece of memory), so no
+    // uninitialized value warning here, and it should be a symbol, not
+    // undefined value, for later comparison.
+    glob = static_cast<PairContainer *>(this)->p.p2;
+  }
+
+	class Empty2 {
+	public:
+		static int glob_p1, glob_p2;
+		Empty2(); // Body below.
+	};
+
+	class PairDoubleEmptyContainer: public Empty, public Empty2 {
+	public:
+    raw_pair p;
+		PairDoubleEmptyContainer(): Empty(), Empty2(), p() {
+      clang_analyzer_eval(p.p1 == 0); // expected-warning{{TRUE}}
+      clang_analyzer_eval(p.p2 == 0); // expected-warning{{TRUE}}
+
+      // This is indeed UNKNOWN.
+      clang_analyzer_eval(p.p1 == glob_p1); // expected-warning{{UNKNOWN}}
+      clang_analyzer_eval(p.p2 == glob_p2); // expected-warning{{UNKNOWN}}
+		}
+	};
+
+	Empty2::Empty2() {
+    glob_p1 = static_cast<PairDoubleEmptyContainer *>(this)->p.p1;
+    glob_p2 = static_cast<PairDoubleEmptyContainer *>(this)->p.p2;
+	}
+
   class PairContainerContainer {
     int padding;
     PairContainer pc;
@@ -749,3 +798,67 @@
   clang_analyzer_eval(d2.x == 1); // expected-warning{{TRUE}}
 }
 }
+
+namespace vbase_zero_init {
+class A {
+  virtual void foo();
+};
+
+class B {
+  virtual void bar();
+public:
+  static int glob_y, glob_z, glob_w;
+  int x;
+  B(); // Body below.
+};
+
+class C : virtual public A {
+public:
+  int y;
+};
+
+class D : public B, public C {
+public:
+  // 'z', unlike 'w', resides in an area that would have been within padding of
+  // base class 'C' if it wasn't part of 'D', but only on 64-bit systems.
+  int z, w;
+  // Initialization order: A(), B(), C().
+  D() : A(), C() {
+    clang_analyzer_eval(x == 1); // expected-warning{{TRUE}}
+    clang_analyzer_eval(y == 0); // expected-warning{{TRUE}}
+#ifdef I386
+    clang_analyzer_eval(z == 3); // expected-warning{{TRUE}}
+#else
+    // FIXME: Should be TRUE. Initialized in B().
+    clang_analyzer_eval(z == 3); // expected-warning{{UNKNOWN}}
+#endif
+    clang_analyzer_eval(w == 4); // expected-warning{{TRUE}}
+
+    // FIXME: Should be UNKNOWN. Changed in B() since glob_y was assigned.
+    clang_analyzer_eval(y == glob_y); // expected-warning{{TRUE}}
+
+#ifdef I386
+    clang_analyzer_eval(z == glob_z); // expected-warning{{UNKNOWN}}
+#else
+    // FIXME: Should be UNKNOWN. Changed in B() since glob_z was assigned.
+    clang_analyzer_eval(z == glob_z); // expected-warning{{TRUE}}
+#endif
+
+    clang_analyzer_eval(w == glob_w); // expected-warning{{UNKNOWN}}
+  } // no-crash
+};
+
+B::B() : x(1) {
+  // Our static members will store the old garbage values of fields that aren't
+  // yet initialized. These aren't certainly garbage though (i.e. the
+  // constructor could have been called on an initialized piece of memory),
+  // so no uninitialized value warning here, and these should be symbols, not
+  // undefined values, for later comparison.
+  glob_y = static_cast<D *>(this)->y;
+  glob_z = static_cast<D *>(this)->z;
+  glob_w = static_cast<D *>(this)->w;
+  static_cast<D *>(this)->y = 2;
+  static_cast<D *>(this)->z = 3;
+  static_cast<D *>(this)->w = 4;
+}
+}
Index: lib/StaticAnalyzer/Core/Store.cpp
===================================================================
--- lib/StaticAnalyzer/Core/Store.cpp
+++ lib/StaticAnalyzer/Core/Store.cpp
@@ -65,10 +65,6 @@
   return MRMgr.getElementRegion(EleTy, idx, Base, svalBuilder.getContext());
 }
 
-StoreRef StoreManager::BindDefault(Store store, const MemRegion *R, SVal V) {
-  return StoreRef(store, *this);
-}
-
 const ElementRegion *StoreManager::GetElementZeroRegion(const SubRegion *R,
                                                         QualType T) {
   NonLoc idx = svalBuilder.makeZeroArrayIndex();
Index: lib/StaticAnalyzer/Core/RegionStore.cpp
===================================================================
--- lib/StaticAnalyzer/Core/RegionStore.cpp
+++ lib/StaticAnalyzer/Core/RegionStore.cpp
@@ -409,8 +409,22 @@
 
   RegionBindingsRef bind(RegionBindingsConstRef B, Loc LV, SVal V);
 
-  // BindDefault is only used to initialize a region with a default value.
-  StoreRef BindDefault(Store store, const MemRegion *R, SVal V) override {
+  // BindDefaultInitial is only used to initialize a region with
+  // a default value.
+  StoreRef BindDefaultInitial(Store store, const MemRegion *R,
+                              SVal V) override {
+    RegionBindingsRef B = getRegionBindings(store);
+    // Use other APIs when you have to wipe the region that was initialized
+    // earlier.
+    assert(!(B.getDefaultBinding(R) || B.getDirectBinding(R)) &&
+           "Double initialization!");
+    B = B.addBinding(BindingKey::Make(R, BindingKey::Default), V);
+    return StoreRef(B.asImmutableMap().getRootWithoutRetain(), *this);
+  }
+
+  // BindDefaultZero is used for zeroing constructors that may accidentally
+  // overwrite existing bindings.
+  StoreRef BindDefaultZero(Store store, const MemRegion *R) override {
     // FIXME: The offsets of empty bases can be tricky because of
     // of the so called "empty base class optimization".
     // If a base class has been optimized out
@@ -420,24 +434,14 @@
     // and trying to infer them from offsets/alignments
     // seems to be error-prone and non-trivial because of the trailing padding.
     // As a temporary mitigation we don't create bindings for empty bases.
-    if (R->getKind() == MemRegion::CXXBaseObjectRegionKind &&
-        cast<CXXBaseObjectRegion>(R)->getDecl()->isEmpty())
-      return StoreRef(store, *this);
+    if (const auto *BR = dyn_cast<CXXBaseObjectRegion>(R))
+      if (BR->getDecl()->isEmpty())
+        return StoreRef(store, *this);
 
     RegionBindingsRef B = getRegionBindings(store);
-    assert(!B.lookup(R, BindingKey::Direct));
-
-    BindingKey Key = BindingKey::Make(R, BindingKey::Default);
-    if (B.lookup(Key)) {
-      const SubRegion *SR = cast<SubRegion>(R);
-      assert(SR->getAsOffset().getOffset() ==
-             SR->getSuperRegion()->getAsOffset().getOffset() &&
-             "A default value must come from a super-region");
-      B = removeSubRegionBindings(B, SR);
-    } else {
-      B = B.addBinding(Key, V);
-    }
-
+    SVal V = svalBuilder.makeZeroVal(Ctx.CharTy);
+    B = removeSubRegionBindings(B, cast<SubRegion>(R));
+    B = B.addBinding(BindingKey::Make(R, BindingKey::Default), V);
     return StoreRef(B.asImmutableMap().getRootWithoutRetain(), *this);
   }
 
Index: lib/StaticAnalyzer/Core/ProgramState.cpp
===================================================================
--- lib/StaticAnalyzer/Core/ProgramState.cpp
+++ lib/StaticAnalyzer/Core/ProgramState.cpp
@@ -126,16 +126,27 @@
   return newState;
 }
 
-ProgramStateRef ProgramState::bindDefault(SVal loc,
-                                          SVal V,
-                                          const LocationContext *LCtx) const {
+ProgramStateRef
+ProgramState::bindDefaultInitial(SVal loc, SVal V,
+                                 const LocationContext *LCtx) const {
+  ProgramStateManager &Mgr = getStateManager();
+  const MemRegion *R = loc.castAs<loc::MemRegionVal>().getRegion();
+  const StoreRef &newStore = Mgr.StoreMgr->BindDefaultInitial(getStore(), R, V);
+  ProgramStateRef new_state = makeWithStore(newStore);
+  return Mgr.getOwningEngine()
+             ? Mgr.getOwningEngine()->processRegionChange(new_state, R, LCtx)
+             : new_state;
+}
+
+ProgramStateRef
+ProgramState::bindDefaultZero(SVal loc, const LocationContext *LCtx) const {
   ProgramStateManager &Mgr = getStateManager();
   const MemRegion *R = loc.castAs<loc::MemRegionVal>().getRegion();
-  const StoreRef &newStore = Mgr.StoreMgr->BindDefault(getStore(), R, V);
+  const StoreRef &newStore = Mgr.StoreMgr->BindDefaultZero(getStore(), R);
   ProgramStateRef new_state = makeWithStore(newStore);
-  return Mgr.getOwningEngine() ?
-           Mgr.getOwningEngine()->processRegionChange(new_state, R, LCtx) :
-           new_state;
+  return Mgr.getOwningEngine()
+             ? Mgr.getOwningEngine()->processRegionChange(new_state, R, LCtx)
+             : new_state;
 }
 
 typedef ArrayRef<const MemRegion *> RegionList;
Index: lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
===================================================================
--- lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
+++ lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
@@ -375,9 +375,6 @@
          I != E; ++I) {
       ProgramStateRef State = (*I)->getState();
       if (CE->requiresZeroInitialization()) {
-        // Type of the zero doesn't matter.
-        SVal ZeroVal = svalBuilder.makeZeroVal(getContext().CharTy);
-
         // FIXME: Once we properly handle constructors in new-expressions, we'll
         // need to invalidate the region before setting a default value, to make
         // sure there aren't any lingering bindings around. This probably needs
@@ -390,7 +387,7 @@
         // actually make things worse. Placement new makes this tricky as well,
         // since it's then possible to be initializing one part of a multi-
         // dimensional array.
-        State = State->bindDefault(loc::MemRegionVal(Target), ZeroVal, LCtx);
+        State = State->bindDefaultZero(loc::MemRegionVal(Target), LCtx);
       }
 
       State = addAllNecessaryTemporaryInfo(State, CC, LCtx, Target);
Index: lib/StaticAnalyzer/Core/ExprEngine.cpp
===================================================================
--- lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -348,7 +348,9 @@
       break;
     case SubobjectAdjustment::MemberPointerAdjustment:
       // FIXME: Unimplemented.
-      State = State->bindDefault(Reg, UnknownVal(), LC);
+      State = State->invalidateRegions(Reg, InitWithAdjustments,
+                                       currBldrCtx->blockCount(), LC, true,
+                                       nullptr, nullptr, nullptr);
       return State;
     }
   }
Index: lib/StaticAnalyzer/Checkers/MallocChecker.cpp
===================================================================
--- lib/StaticAnalyzer/Checkers/MallocChecker.cpp
+++ lib/StaticAnalyzer/Checkers/MallocChecker.cpp
@@ -1273,7 +1273,7 @@
   State = State->BindExpr(CE, C.getLocationContext(), RetVal);
 
   // Fill the region with the initialization value.
-  State = State->bindDefault(RetVal, Init, LCtx);
+  State = State->bindDefaultInitial(RetVal, Init, LCtx);
 
   // Set the region's extent equal to the Size parameter.
   const SymbolicRegion *R =
Index: include/clang/StaticAnalyzer/Core/PathSensitive/Store.h
===================================================================
--- include/clang/StaticAnalyzer/Core/PathSensitive/Store.h
+++ include/clang/StaticAnalyzer/Core/PathSensitive/Store.h
@@ -107,7 +107,15 @@
   ///   by \c val bound to the location given for \c loc.
   virtual StoreRef Bind(Store store, Loc loc, SVal val) = 0;
 
-  virtual StoreRef BindDefault(Store store, const MemRegion *R, SVal V);
+  /// Return a store with the specified value bound to all sub-regions of the
+  /// region. The region must not have previous bindings. If you need to
+  /// invalidate existing bindings, consider invalidateRegions().
+  virtual StoreRef BindDefaultInitial(Store store, const MemRegion *R,
+                                      SVal V) = 0;
+
+  /// Return a store with in which all values within the given region are
+  /// reset to zero. This method is allowed to overwrite previous bindings.
+  virtual StoreRef BindDefaultZero(Store store, const MemRegion *R) = 0;
 
   /// \brief Create a new store with the specified binding removed.
   /// \param ST the original store, that is the basis for the new store.
Index: include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
===================================================================
--- include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
+++ include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
@@ -242,7 +242,18 @@
 
   ProgramStateRef bindLoc(SVal location, SVal V, const LocationContext *LCtx) const;
 
-  ProgramStateRef bindDefault(SVal loc, SVal V, const LocationContext *LCtx) const;
+  /// Initializes the region of memory represented by \p loc with an initial
+  /// value. Once initialized, all values loaded from any sub-regions of that
+  /// region will be equal to \p V, unless overwritten later by the program.
+  /// This method should not be used on regions that are already initialized.
+  /// If you need to indicate that memory contents have suddenly become unknown
+  /// within a certain region of memory, consider invalidateRegions().
+  ProgramStateRef bindDefaultInitial(SVal loc, SVal V,
+                                     const LocationContext *LCtx) const;
+
+  /// Performs C++ zero-initialization procedure on the region of memory
+  /// represented by \p loc.
+  ProgramStateRef bindDefaultZero(SVal loc, const LocationContext *LCtx) const;
 
   ProgramStateRef killBinding(Loc LV) const;
 
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to