NoQ created this revision.
NoQ added reviewers: dcoughlin, xazax.hun, a.sidorin, george.karpenkov, szepet.
Herald added subscribers: cfe-commits, rnkovacs, baloghadamsoftware.

I've been investigating a false positive that had a pointer-type symbol with a 
non-zero range which had its range forgotten and later assumed to be zero. The 
big test that i added 
(`test_region_referenced_only_through_field_in_store_value()`) roughly 
re-creates the infeasible path that was taken. A bit simplified, it looks like 
this:

   1 int *p;
   2
   3 struct S {
   4   int field;
   5 };
   6
   7 struct S *conjure();
   8
   9 void foo() {
  10   struct S *s = conjure();
  11   p = &s->field;
  12   if (!s) {
  13     exit(0);
  14   }
  15   if (!p) {
  16     clang_analyzer_warnIfReached(); // no-warning
  17   }
  18 }

It is easy to see that the path with `clang_analyzer_warnIfReached()` is 
infeasible. The bug that causes us to explore that infeasible path is in the 
`RegionStore`'s dead symbol removal algorithm that didn't realize that the 
symbol is live. Recall that `RegionStore` scans itself with the help of a 
worklist algorithm - it enforces the invariant of //"if a binding key is live, 
then the respective binding value is also live"// (and a few other similar 
invariants) and marks symbols and regions as live until the live set reaches a 
fixed point (i.e. the worklist runs dry).

In our example the symbol in question is the return value of `conjure_S1()`. 
We'd still refer to it as `$x`.

1. `$x` is present in the Store as a binding value of form 
`&SymRegion{$x}->field` (which is equivalent to `$x + offsetof(field)`, i.e. 
it's possible to easily recover `$x` from it).
2. Symbol `$x` is not mentioned anywhere else in the program state.
3. Additionally, there are no store bindings anywhere within region 
`SymRegion{$x}`.
4. When `&SymRegion{$x}->field` is found as a binding value in 
`removeDeadBindingsWorker::VisitBinding()`, symbolic base `SymRegion{$x}` of 
region `SymRegion{$x}->field` is put into the worklist 
(`removeDeadBindingsWorker::AddToWorkList()` descends to the base region 
automatically).
5. However, `$x` is not added to the live set yet, because the loop through 
sub-symbols (from `symbol_begin()` to `symbol_end()`) doesn't descend to from 
`SymRegion{$x}->field` to its symbolic base `SymRegion{$x}` in order to find 
`$x`.
6. In most cases `$x` will be found anyway when later processing the worklist. 
Indeed, `removeDeadBindingsWorker::VisitCluster()` would take `SymRegion{$x}` 
from the worklist and immediately find `$x` within it. This is actually one of 
the "other similar invariants": //"if a binding key is live and is a symbolic 
region, its parent symbol is live"//.
7. In our case, however, this will not happen, because `SymRegion{$x}` has no 
bindings and therefore its cluster is empty and 
`removeDeadBindingsWorker::VisitCluster()` will bail out early.

Essentially, it worked most of the time because most regions have bindings. But 
normally it is not a responsibility of 
`removeDeadBindingsWorker::VisitCluster()` to find symbol `$x` within binding 
value `SymRegion{$x}->field` - it should definitely not try to care about the 
liveness of stuff in empty clusters, those should only be live if there are 
other reasons to believe they're alive. Being a binding value is one such 
"other reason", which means that on step 5 we should have added `$x` to the 
live set.

I believe that this is exactly how step 5 was intended to work. However, the 
undocumented behavior of `symbol_begin()` that consists in not descending to 
the base region was unexpected.

-----

In order to confirm my understanding, i tried to see what other users of 
`symbol_begin()` expect. It turns out that three other users made the same 
mistake, and in two of these three cases it led to keeping the symbol around 
longer (instead shorter) than necessary (i.e. zombie symbols), while the third 
case is similar to the original example albeit more exotic. In all other cases 
users of `symbol_begin()` did not care if it descends to the base region or not.

Namely:

- `CStringChecker::checkLiveSymbols()` iterates over sub-symbols of string 
length to mark string lengths as live. Because string length is for now only 
composed of `CStringChecker`-tagged metadata symbols and constants, and 
therefore it is a `NonLoc` that isn't a `nonloc::LocAsInteger`, it is 
irrelevant whether `symbol_begin` descends to the symbolic base.
- `Environment::removeDeadBindings()` marks symbols within a value of a dead 
expression as `maybeDead()`. I added a test case, 
`test_region_referenced_only_through_field_in_environment_value()`, to 
demonstrate that we indeed produce a zombie symbol due to `symbol_begin()` here 
not descending to the symbolic base.
- `ScanReachableSymbols::scan(Sym)` scans sub-symbols of a raw symbol, not of 
an `SVal`. Therefore it is not a region and there is no need to descend to the 
symbolic base.
- `ProgramState::isTainted(Sym, Kind)` scans a raw symbol as well.
- `removeDeadBindingsWorker::VisitBinding()` the test case we've just discussed.
- `RegionStoreManager::removeDeadBindings` also has a bug similar to the one in 
`Environment::removeDeadBindings()`, which is demonstrated by test case 
`test_zombie_referenced_only_through_field_in_store_value()`.
- `SymExpr::computeComplexity()` scans a raw symbol.
- `SymbolReaper::markElementIndicesLive()` scans an array symbol, which is 
always a `NonLoc`. However, it may still be a `nonloc::LocAsInteger`, and in 
this case descend to the symbolic base would be necessary, as demonstrated by 
`test_loc_as_integer_element_index_lifetime()`. This test, however, is not 
fixed by the current patch, due to `getAsLocSymbol()` incorrectly failing to 
pass its `IncludeBaseRegions` argument into the recursive call. But i decided 
not to investigate it further today, because, well, //enough is enough//.


Repository:
  rC Clang

https://reviews.llvm.org/D44347

Files:
  include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
  test/Analysis/symbol-reaper.c


Index: test/Analysis/symbol-reaper.c
===================================================================
--- test/Analysis/symbol-reaper.c
+++ test/Analysis/symbol-reaper.c
@@ -3,6 +3,9 @@
 void clang_analyzer_eval(int);
 void clang_analyzer_warnOnDeadSymbol(int);
 void clang_analyzer_numTimesReached();
+void clang_analyzer_warnIfReached();
+
+void exit(int);
 
 int conjure_index();
 
@@ -58,6 +61,12 @@
 struct S2 {
   struct S1 array[5];
 } s2;
+struct S3 {
+  void *field;
+};
+
+struct S1 *conjure_S1();
+struct S3 *conjure_S3();
 
 void test_element_index_lifetime_with_complicated_hierarchy_of_regions() {
   do {
@@ -68,6 +77,18 @@
   } while (0); // no-warning
 }
 
+void test_loc_as_integer_element_index_lifetime() {
+  do {
+    int x;
+    struct S3 *s = conjure_S3();
+    clang_analyzer_warnOnDeadSymbol((int)s);
+    x = (int)&(s->field);
+    ptr = &arr[x];
+    if (!s) {}
+  // FIXME: Should not warn. The symbol is still alive within the ptr's index.
+  } while (0); // expected-warning{{SYMBOL DEAD}}
+}
+
 // Test below checks lifetime of SymbolRegionValue in certain conditions.
 
 int **ptrptr;
@@ -78,3 +99,38 @@
   (void)0; // No-op; make sure the environment forgets things and the GC runs.
   clang_analyzer_eval(**ptrptr); // expected-warning{{TRUE}}
 } // no-warning
+
+int *produce_region_referenced_only_through_field_in_environment_value() {
+  struct S1 *s = conjure_S1();
+  clang_analyzer_warnOnDeadSymbol((int) s);
+  int *x = &s->field;
+  return x;
+}
+
+void test_region_referenced_only_through_field_in_environment_value() {
+  produce_region_referenced_only_through_field_in_environment_value();
+} // expected-warning{{SYMBOL DEAD}}
+
+void test_region_referenced_only_through_field_in_store_value() {
+  struct S1 *s = conjure_S1();
+  clang_analyzer_warnOnDeadSymbol((int) s);
+  ptr = &s->field; // Write the symbol into a global. It should live forever.
+  if (!s) {
+    exit(0); // no-warning (symbol should not die here)
+    // exit() is noreturn.
+    clang_analyzer_warnIfReached(); // no-warning
+  }
+  if (!ptr) { // no-warning (symbol should not die here)
+    // We exit()ed under these constraints earlier.
+    clang_analyzer_warnIfReached(); // no-warning
+  }
+  // The exit() call invalidates globals. The symbol will die here because
+  // the exit() statement itself is already over and there's no better 
statement
+  // to put the diagnostic on.
+} // expected-warning{{SYMBOL DEAD}}
+
+void test_zombie_referenced_only_through_field_in_store_value() {
+  struct S1 *s = conjure_S1();
+  clang_analyzer_warnOnDeadSymbol((int) s);
+  int *x = &s->field;
+} // expected-warning{{SYMBOL DEAD}}
Index: include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
===================================================================
--- include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
+++ include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
@@ -183,7 +183,7 @@
   void dump() const;
 
   SymExpr::symbol_iterator symbol_begin() const {
-    const SymExpr *SE = getAsSymbolicExpression();
+    const SymExpr *SE = getAsSymbol(/*IncludeBaseRegions=*/true);
     if (SE)
       return SE->symbol_begin();
     else


Index: test/Analysis/symbol-reaper.c
===================================================================
--- test/Analysis/symbol-reaper.c
+++ test/Analysis/symbol-reaper.c
@@ -3,6 +3,9 @@
 void clang_analyzer_eval(int);
 void clang_analyzer_warnOnDeadSymbol(int);
 void clang_analyzer_numTimesReached();
+void clang_analyzer_warnIfReached();
+
+void exit(int);
 
 int conjure_index();
 
@@ -58,6 +61,12 @@
 struct S2 {
   struct S1 array[5];
 } s2;
+struct S3 {
+  void *field;
+};
+
+struct S1 *conjure_S1();
+struct S3 *conjure_S3();
 
 void test_element_index_lifetime_with_complicated_hierarchy_of_regions() {
   do {
@@ -68,6 +77,18 @@
   } while (0); // no-warning
 }
 
+void test_loc_as_integer_element_index_lifetime() {
+  do {
+    int x;
+    struct S3 *s = conjure_S3();
+    clang_analyzer_warnOnDeadSymbol((int)s);
+    x = (int)&(s->field);
+    ptr = &arr[x];
+    if (!s) {}
+  // FIXME: Should not warn. The symbol is still alive within the ptr's index.
+  } while (0); // expected-warning{{SYMBOL DEAD}}
+}
+
 // Test below checks lifetime of SymbolRegionValue in certain conditions.
 
 int **ptrptr;
@@ -78,3 +99,38 @@
   (void)0; // No-op; make sure the environment forgets things and the GC runs.
   clang_analyzer_eval(**ptrptr); // expected-warning{{TRUE}}
 } // no-warning
+
+int *produce_region_referenced_only_through_field_in_environment_value() {
+  struct S1 *s = conjure_S1();
+  clang_analyzer_warnOnDeadSymbol((int) s);
+  int *x = &s->field;
+  return x;
+}
+
+void test_region_referenced_only_through_field_in_environment_value() {
+  produce_region_referenced_only_through_field_in_environment_value();
+} // expected-warning{{SYMBOL DEAD}}
+
+void test_region_referenced_only_through_field_in_store_value() {
+  struct S1 *s = conjure_S1();
+  clang_analyzer_warnOnDeadSymbol((int) s);
+  ptr = &s->field; // Write the symbol into a global. It should live forever.
+  if (!s) {
+    exit(0); // no-warning (symbol should not die here)
+    // exit() is noreturn.
+    clang_analyzer_warnIfReached(); // no-warning
+  }
+  if (!ptr) { // no-warning (symbol should not die here)
+    // We exit()ed under these constraints earlier.
+    clang_analyzer_warnIfReached(); // no-warning
+  }
+  // The exit() call invalidates globals. The symbol will die here because
+  // the exit() statement itself is already over and there's no better statement
+  // to put the diagnostic on.
+} // expected-warning{{SYMBOL DEAD}}
+
+void test_zombie_referenced_only_through_field_in_store_value() {
+  struct S1 *s = conjure_S1();
+  clang_analyzer_warnOnDeadSymbol((int) s);
+  int *x = &s->field;
+} // expected-warning{{SYMBOL DEAD}}
Index: include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
===================================================================
--- include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
+++ include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
@@ -183,7 +183,7 @@
   void dump() const;
 
   SymExpr::symbol_iterator symbol_begin() const {
-    const SymExpr *SE = getAsSymbolicExpression();
+    const SymExpr *SE = getAsSymbol(/*IncludeBaseRegions=*/true);
     if (SE)
       return SE->symbol_begin();
     else
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to