NoQ created this revision.
NoQ added reviewers: dcoughlin, xazax.hun, a.sidorin, george.karpenkov, szepet.
Herald added subscribers: cfe-commits, rnkovacs, baloghadamsoftware.
I've been investigating a false positive that had a pointer-type symbol with a
non-zero range which had its range forgotten and later assumed to be zero. The
big test that i added
(`test_region_referenced_only_through_field_in_store_value()`) roughly
re-creates the infeasible path that was taken. A bit simplified, it looks like
this:
1 int *p;
2
3 struct S {
4 int field;
5 };
6
7 struct S *conjure();
8
9 void foo() {
10 struct S *s = conjure();
11 p = &s->field;
12 if (!s) {
13 exit(0);
14 }
15 if (!p) {
16 clang_analyzer_warnIfReached(); // no-warning
17 }
18 }
It is easy to see that the path with `clang_analyzer_warnIfReached()` is
infeasible. The bug that causes us to explore that infeasible path is in the
`RegionStore`'s dead symbol removal algorithm that didn't realize that the
symbol is live. Recall that `RegionStore` scans itself with the help of a
worklist algorithm - it enforces the invariant of //"if a binding key is live,
then the respective binding value is also live"// (and a few other similar
invariants) and marks symbols and regions as live until the live set reaches a
fixed point (i.e. the worklist runs dry).
In our example the symbol in question is the return value of `conjure_S1()`.
We'd still refer to it as `$x`.
1. `$x` is present in the Store as a binding value of form
`&SymRegion{$x}->field` (which is equivalent to `$x + offsetof(field)`, i.e.
it's possible to easily recover `$x` from it).
2. Symbol `$x` is not mentioned anywhere else in the program state.
3. Additionally, there are no store bindings anywhere within region
`SymRegion{$x}`.
4. When `&SymRegion{$x}->field` is found as a binding value in
`removeDeadBindingsWorker::VisitBinding()`, symbolic base `SymRegion{$x}` of
region `SymRegion{$x}->field` is put into the worklist
(`removeDeadBindingsWorker::AddToWorkList()` descends to the base region
automatically).
5. However, `$x` is not added to the live set yet, because the loop through
sub-symbols (from `symbol_begin()` to `symbol_end()`) doesn't descend to from
`SymRegion{$x}->field` to its symbolic base `SymRegion{$x}` in order to find
`$x`.
6. In most cases `$x` will be found anyway when later processing the worklist.
Indeed, `removeDeadBindingsWorker::VisitCluster()` would take `SymRegion{$x}`
from the worklist and immediately find `$x` within it. This is actually one of
the "other similar invariants": //"if a binding key is live and is a symbolic
region, its parent symbol is live"//.
7. In our case, however, this will not happen, because `SymRegion{$x}` has no
bindings and therefore its cluster is empty and
`removeDeadBindingsWorker::VisitCluster()` will bail out early.
Essentially, it worked most of the time because most regions have bindings. But
normally it is not a responsibility of
`removeDeadBindingsWorker::VisitCluster()` to find symbol `$x` within binding
value `SymRegion{$x}->field` - it should definitely not try to care about the
liveness of stuff in empty clusters, those should only be live if there are
other reasons to believe they're alive. Being a binding value is one such
"other reason", which means that on step 5 we should have added `$x` to the
live set.
I believe that this is exactly how step 5 was intended to work. However, the
undocumented behavior of `symbol_begin()` that consists in not descending to
the base region was unexpected.
-----
In order to confirm my understanding, i tried to see what other users of
`symbol_begin()` expect. It turns out that three other users made the same
mistake, and in two of these three cases it led to keeping the symbol around
longer (instead shorter) than necessary (i.e. zombie symbols), while the third
case is similar to the original example albeit more exotic. In all other cases
users of `symbol_begin()` did not care if it descends to the base region or not.
Namely:
- `CStringChecker::checkLiveSymbols()` iterates over sub-symbols of string
length to mark string lengths as live. Because string length is for now only
composed of `CStringChecker`-tagged metadata symbols and constants, and
therefore it is a `NonLoc` that isn't a `nonloc::LocAsInteger`, it is
irrelevant whether `symbol_begin` descends to the symbolic base.
- `Environment::removeDeadBindings()` marks symbols within a value of a dead
expression as `maybeDead()`. I added a test case,
`test_region_referenced_only_through_field_in_environment_value()`, to
demonstrate that we indeed produce a zombie symbol due to `symbol_begin()` here
not descending to the symbolic base.
- `ScanReachableSymbols::scan(Sym)` scans sub-symbols of a raw symbol, not of
an `SVal`. Therefore it is not a region and there is no need to descend to the
symbolic base.
- `ProgramState::isTainted(Sym, Kind)` scans a raw symbol as well.
- `removeDeadBindingsWorker::VisitBinding()` the test case we've just discussed.
- `RegionStoreManager::removeDeadBindings` also has a bug similar to the one in
`Environment::removeDeadBindings()`, which is demonstrated by test case
`test_zombie_referenced_only_through_field_in_store_value()`.
- `SymExpr::computeComplexity()` scans a raw symbol.
- `SymbolReaper::markElementIndicesLive()` scans an array symbol, which is
always a `NonLoc`. However, it may still be a `nonloc::LocAsInteger`, and in
this case descend to the symbolic base would be necessary, as demonstrated by
`test_loc_as_integer_element_index_lifetime()`. This test, however, is not
fixed by the current patch, due to `getAsLocSymbol()` incorrectly failing to
pass its `IncludeBaseRegions` argument into the recursive call. But i decided
not to investigate it further today, because, well, //enough is enough//.
Repository:
rC Clang
https://reviews.llvm.org/D44347
Files:
include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
test/Analysis/symbol-reaper.c
Index: test/Analysis/symbol-reaper.c
===================================================================
--- test/Analysis/symbol-reaper.c
+++ test/Analysis/symbol-reaper.c
@@ -3,6 +3,9 @@
void clang_analyzer_eval(int);
void clang_analyzer_warnOnDeadSymbol(int);
void clang_analyzer_numTimesReached();
+void clang_analyzer_warnIfReached();
+
+void exit(int);
int conjure_index();
@@ -58,6 +61,12 @@
struct S2 {
struct S1 array[5];
} s2;
+struct S3 {
+ void *field;
+};
+
+struct S1 *conjure_S1();
+struct S3 *conjure_S3();
void test_element_index_lifetime_with_complicated_hierarchy_of_regions() {
do {
@@ -68,6 +77,18 @@
} while (0); // no-warning
}
+void test_loc_as_integer_element_index_lifetime() {
+ do {
+ int x;
+ struct S3 *s = conjure_S3();
+ clang_analyzer_warnOnDeadSymbol((int)s);
+ x = (int)&(s->field);
+ ptr = &arr[x];
+ if (!s) {}
+ // FIXME: Should not warn. The symbol is still alive within the ptr's index.
+ } while (0); // expected-warning{{SYMBOL DEAD}}
+}
+
// Test below checks lifetime of SymbolRegionValue in certain conditions.
int **ptrptr;
@@ -78,3 +99,38 @@
(void)0; // No-op; make sure the environment forgets things and the GC runs.
clang_analyzer_eval(**ptrptr); // expected-warning{{TRUE}}
} // no-warning
+
+int *produce_region_referenced_only_through_field_in_environment_value() {
+ struct S1 *s = conjure_S1();
+ clang_analyzer_warnOnDeadSymbol((int) s);
+ int *x = &s->field;
+ return x;
+}
+
+void test_region_referenced_only_through_field_in_environment_value() {
+ produce_region_referenced_only_through_field_in_environment_value();
+} // expected-warning{{SYMBOL DEAD}}
+
+void test_region_referenced_only_through_field_in_store_value() {
+ struct S1 *s = conjure_S1();
+ clang_analyzer_warnOnDeadSymbol((int) s);
+ ptr = &s->field; // Write the symbol into a global. It should live forever.
+ if (!s) {
+ exit(0); // no-warning (symbol should not die here)
+ // exit() is noreturn.
+ clang_analyzer_warnIfReached(); // no-warning
+ }
+ if (!ptr) { // no-warning (symbol should not die here)
+ // We exit()ed under these constraints earlier.
+ clang_analyzer_warnIfReached(); // no-warning
+ }
+ // The exit() call invalidates globals. The symbol will die here because
+ // the exit() statement itself is already over and there's no better
statement
+ // to put the diagnostic on.
+} // expected-warning{{SYMBOL DEAD}}
+
+void test_zombie_referenced_only_through_field_in_store_value() {
+ struct S1 *s = conjure_S1();
+ clang_analyzer_warnOnDeadSymbol((int) s);
+ int *x = &s->field;
+} // expected-warning{{SYMBOL DEAD}}
Index: include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
===================================================================
--- include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
+++ include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
@@ -183,7 +183,7 @@
void dump() const;
SymExpr::symbol_iterator symbol_begin() const {
- const SymExpr *SE = getAsSymbolicExpression();
+ const SymExpr *SE = getAsSymbol(/*IncludeBaseRegions=*/true);
if (SE)
return SE->symbol_begin();
else
Index: test/Analysis/symbol-reaper.c
===================================================================
--- test/Analysis/symbol-reaper.c
+++ test/Analysis/symbol-reaper.c
@@ -3,6 +3,9 @@
void clang_analyzer_eval(int);
void clang_analyzer_warnOnDeadSymbol(int);
void clang_analyzer_numTimesReached();
+void clang_analyzer_warnIfReached();
+
+void exit(int);
int conjure_index();
@@ -58,6 +61,12 @@
struct S2 {
struct S1 array[5];
} s2;
+struct S3 {
+ void *field;
+};
+
+struct S1 *conjure_S1();
+struct S3 *conjure_S3();
void test_element_index_lifetime_with_complicated_hierarchy_of_regions() {
do {
@@ -68,6 +77,18 @@
} while (0); // no-warning
}
+void test_loc_as_integer_element_index_lifetime() {
+ do {
+ int x;
+ struct S3 *s = conjure_S3();
+ clang_analyzer_warnOnDeadSymbol((int)s);
+ x = (int)&(s->field);
+ ptr = &arr[x];
+ if (!s) {}
+ // FIXME: Should not warn. The symbol is still alive within the ptr's index.
+ } while (0); // expected-warning{{SYMBOL DEAD}}
+}
+
// Test below checks lifetime of SymbolRegionValue in certain conditions.
int **ptrptr;
@@ -78,3 +99,38 @@
(void)0; // No-op; make sure the environment forgets things and the GC runs.
clang_analyzer_eval(**ptrptr); // expected-warning{{TRUE}}
} // no-warning
+
+int *produce_region_referenced_only_through_field_in_environment_value() {
+ struct S1 *s = conjure_S1();
+ clang_analyzer_warnOnDeadSymbol((int) s);
+ int *x = &s->field;
+ return x;
+}
+
+void test_region_referenced_only_through_field_in_environment_value() {
+ produce_region_referenced_only_through_field_in_environment_value();
+} // expected-warning{{SYMBOL DEAD}}
+
+void test_region_referenced_only_through_field_in_store_value() {
+ struct S1 *s = conjure_S1();
+ clang_analyzer_warnOnDeadSymbol((int) s);
+ ptr = &s->field; // Write the symbol into a global. It should live forever.
+ if (!s) {
+ exit(0); // no-warning (symbol should not die here)
+ // exit() is noreturn.
+ clang_analyzer_warnIfReached(); // no-warning
+ }
+ if (!ptr) { // no-warning (symbol should not die here)
+ // We exit()ed under these constraints earlier.
+ clang_analyzer_warnIfReached(); // no-warning
+ }
+ // The exit() call invalidates globals. The symbol will die here because
+ // the exit() statement itself is already over and there's no better statement
+ // to put the diagnostic on.
+} // expected-warning{{SYMBOL DEAD}}
+
+void test_zombie_referenced_only_through_field_in_store_value() {
+ struct S1 *s = conjure_S1();
+ clang_analyzer_warnOnDeadSymbol((int) s);
+ int *x = &s->field;
+} // expected-warning{{SYMBOL DEAD}}
Index: include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
===================================================================
--- include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
+++ include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
@@ -183,7 +183,7 @@
void dump() const;
SymExpr::symbol_iterator symbol_begin() const {
- const SymExpr *SE = getAsSymbolicExpression();
+ const SymExpr *SE = getAsSymbol(/*IncludeBaseRegions=*/true);
if (SE)
return SE->symbol_begin();
else
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits