seaneveson created this revision. seaneveson added a subscriber: cfe-commits.
Dear All, We have been looking at the following problem, where any code after the constant bound loop is not analyzed because of the limit on how many times the same block is visited, as described in bugzillas #7638 and #23438. This problem is of interest to us because we have identified significant bugs that the checkers are not locating. We have been discussing a solution involving ranges as a longer term project, but I would like to propose a patch to improve the current implementation. Example issue: ``` for (int i = 0; i < 1000; ++i) {...something...} int *p = 0; *p = 0xDEADBEEF; ``` The proposal is to go through the first and last iterations of the loop. The patch creates an exploded node for the approximate last iteration of constant bound loops, before the max loop limit / block visit limit is reached. It does this by identifying the variable in the loop condition and finding the value which is “one away” from the loop being false. For example, if the condition is (x < 10), then an exploded node is created where the value of x is 9. Evaluating the loop body with x = 9 will then result in the analysis continuing after the loop, providing x is incremented. The patch passes all the tests, with some modifications to coverage.c, in order to make the ‘function_which_gives_up’ continue to give up, since the changes allowed the analysis to progress past the loop. This patch does introduce possible false positives, as a result of not knowing the state of variables which might be modified in the loop. I believe that, as a user, I would rather have false positives after loops than do no analysis at all. I understand this may not be the common opinion and am interested in hearing your views. There are also issues regarding break statements, which are not considered. A more advanced implementation of this approach might be able to consider other conditions in the loop, which would allow paths leading to breaks to be analyzed. Lastly, I have performed a study on large code bases and I think there is little benefit in having “max-loop” default to 4 with the patch. For variable bound loops this tends to result in duplicated analysis after the loop, and it makes little difference to any constant bound loop which will do more than a few iterations. It might be beneficial to lower the default to 2, especially for the shallow analysis setting. Please let me know your opinions on this approach to processing constant bound loops and the patch itself. Regards, Sean Eveson SN Systems - Sony Computer Entertainment Group http://reviews.llvm.org/D12358 Files: lib/StaticAnalyzer/Core/ExprEngine.cpp test/Analysis/constant-bound-loops.c test/Analysis/coverage.c
Index: test/Analysis/coverage.c =================================================================== --- test/Analysis/coverage.c +++ test/Analysis/coverage.c @@ -15,13 +15,15 @@ } static void function_which_gives_up(int *x) { - for (int i = 0; i < 5; ++i) + // i + 1 to prevent the analyzer approximating the last iteration + for (int i = 0; i + 1 < 6; ++i) (*x)++; } static void function_which_gives_up_nested(int *x) { function_which_gives_up(x); - for (int i = 0; i < 5; ++i) + // i + 1 to prevent the analyzer approximating the last iteration + for (int i = 0; i + 1 < 6; ++i) (*x)++; } @@ -55,7 +57,8 @@ } // expected-warning {{Potential leak of memory pointed to by 'm'}} void coverage5(int *x) { - for (int i = 0; i<7; ++i) + // i + 1 to prevent the analyzer approximating the last iteration + for (int i = 0; i + 1 < 8; ++i) function_which_gives_up(x); // The root function gives up here. char *m = (char*)malloc(12); // no-warning @@ -83,7 +86,8 @@ void function_which_gives_up_settonull(int **x) { *x = 0; int y = 0; - for (int i = 0; i < 5; ++i) + // i + 1 to prevent the analyzer approximating the last iteration + for (int i = 0; i + 1 < 6; ++i) y++; } Index: test/Analysis/constant-bound-loops.c =================================================================== --- test/Analysis/constant-bound-loops.c +++ test/Analysis/constant-bound-loops.c @@ -0,0 +1,164 @@ +// RUN: %clang_cc1 -analyze -analyzer-checker=core,unix.Malloc,debug.ExprInspection -analyzer-store=region -analyzer-max-loop 4 -verify %s + +extern void clang_analyzer_eval(_Bool); + +typedef __typeof(sizeof(int)) size_t; +void *malloc(size_t); + +void incr_for_loop() { + int i; + for (i = 0; i < 10; ++i) {} + clang_analyzer_eval(i == 10); // expected-warning {{TRUE}} + char *m = (char*)malloc(12); +} // expected-warning {{Potential leak of memory pointed to by 'm'}} + +void decr_for_loop() { + int i; + for (i = 10; i > 0; --i) {} + clang_analyzer_eval(i == 0); // expected-warning {{TRUE}} + char *m = (char*)malloc(12); +} // expected-warning {{Potential leak of memory pointed to by 'm'}} + +void incr_while_loop() { + int i = 0; + while (i < 10) {++i;} + clang_analyzer_eval(i == 10); // expected-warning {{TRUE}} + char *m = (char*)malloc(12); +} // expected-warning {{Potential leak of memory pointed to by 'm'}} + +void decr_while_loop() { + int i = 10; + while (i > 0) {--i;} + clang_analyzer_eval(i == 0); // expected-warning {{TRUE}} + char *m = (char*)malloc(12); +} // expected-warning {{Potential leak of memory pointed to by 'm'}} + +void incr_do_while_loop() { + int i = 0; + do {++i;} while (i < 10); + clang_analyzer_eval(i == 10); // expected-warning {{TRUE}} + char *m = (char*)malloc(12); +} // expected-warning {{Potential leak of memory pointed to by 'm'}} + +void decr_do_while_loop() { + int i = 10; + do {--i;} while (i > 0); + clang_analyzer_eval(i == 0); // expected-warning {{TRUE}} + char *m = (char*)malloc(12); +} // expected-warning {{Potential leak of memory pointed to by 'm'}} + +void negative_incr_for_loop() { + int i; + for (i = -10; i < 5 - 10; ++i) {} + clang_analyzer_eval(i == -5); // expected-warning {{TRUE}} + char *m = (char*)malloc(12); +} // expected-warning {{Potential leak of memory pointed to by 'm'}} + +void negative_decr_for_loop() { + int i; + for (i = 5 - 10; i > -20; --i) {} + clang_analyzer_eval(i == -20); // expected-warning {{TRUE}} + char *m = (char*)malloc(12); +} // expected-warning {{Potential leak of memory pointed to by 'm'}} + +void larger_incr_for_loop() { + int i; + for (i = 0; i < 20; i += 3) {} + char *m = (char*)malloc(12); +} // expected-warning {{Potential leak of memory pointed to by 'm'}} + +void larger_decr_for_loop() { + int i; + for (i = 20; i > 0; i -= 3) {} + char *m = (char*)malloc(12); +} // expected-warning {{Potential leak of memory pointed to by 'm'}} + +void unsigned_incr_for_loop() { + unsigned i; + for (i = 0; i < 10; ++i) {} + clang_analyzer_eval(i == 10); // expected-warning {{TRUE}} + char *m = (char*)malloc(12); +} // expected-warning {{Potential leak of memory pointed to by 'm'}} + +void unsigned_decr_for_loop() { + unsigned i; + for (i = 10; i > 0; --i) {} + clang_analyzer_eval(i == 0); // expected-warning {{TRUE}} + char *m = (char*)malloc(12); +} // expected-warning {{Potential leak of memory pointed to by 'm'}} + +void short_for_loop() { + short i; + for (i = 0; i < 10; ++i) {} + clang_analyzer_eval(i == 10); // expected-warning {{TRUE}} + char *m = (char*)malloc(12); +} // expected-warning {{Potential leak of memory pointed to by 'm'}} + +void lte_for_loop() { + int i; + for (i = 0; i <= 10; ++i) {} + clang_analyzer_eval(i == 11); // expected-warning {{TRUE}} + char *m = (char*)malloc(12); +} // expected-warning {{Potential leak of memory pointed to by 'm'}} + +void gte_for_loop() { + int i; + for (i = 10; i >= 0; --i) {} + clang_analyzer_eval(i == -1); // expected-warning {{TRUE}} + char *m = (char*)malloc(12); +} // expected-warning {{Potential leak of memory pointed to by 'm'}} + +void incr_ne_for_loop() { + int i; + for (i = 0; i != 10; ++i) {} + clang_analyzer_eval(i == 10); // expected-warning {{TRUE}} + char *m = (char*)malloc(12); +} // expected-warning {{Potential leak of memory pointed to by 'm'}} + +void decr_ne_for_loop() { + int i; + for (i = 10; i != 0; --i) {} + clang_analyzer_eval(i == 0); // expected-warning {{TRUE}} + char *m = (char*)malloc(12); +} // expected-warning {{Potential leak of memory pointed to by 'm'}} + +void reversed_condition_for_loop() { + int i; + for (i = 0; 10 > i; ++i) {} + clang_analyzer_eval(i == 10); // expected-warning {{TRUE}} + for (i = 10; 0 < i; --i) {} + clang_analyzer_eval(i == 0); // expected-warning {{TRUE}} + for (i = 0; 10 >= i; ++i) {} + clang_analyzer_eval(i == 11); // expected-warning {{TRUE}} + for (i = 10; 0 <= i; --i) {} + clang_analyzer_eval(i == -1); // expected-warning {{TRUE}} + char *m = (char*)malloc(12); +} // expected-warning {{Potential leak of memory pointed to by 'm'}} + +void infinite_for_loop() { + int i; + for (i = 0; i < 1; i += 0) {} + char *m = (char*)malloc(12); +} // no-warning + +void infinite_incr_for_loop() { + int i; + for (i = 1; i > 0; ++i) {} + char *m = (char*)malloc(12); +} // no-warning + +void infinite_decr_for_loop() { + int i; + for (i = 0; i < 1; --i) {} + char *m = (char*)malloc(12); +} // no-warning + +void infinite_while_loop() { + int i = 0; + while (i < 10) { + ++i; + --i; + } + char *m = (char*)malloc(12); +} // no-warning + Index: lib/StaticAnalyzer/Core/ExprEngine.cpp =================================================================== --- lib/StaticAnalyzer/Core/ExprEngine.cpp +++ lib/StaticAnalyzer/Core/ExprEngine.cpp @@ -1519,6 +1519,125 @@ llvm_unreachable("could not resolve condition"); } +static const VarDecl *getSingleVariable(const Expr *Operand) { + if (const CastExpr *CE = dyn_cast<CastExpr>(Operand)) { + return getSingleVariable(CE->getSubExpr()); + } + if (const DeclRefExpr *DR = dyn_cast<DeclRefExpr>(Operand)) { + return dyn_cast<VarDecl>(DR->getDecl()); + } + return NULL; +} + +static ProgramStateRef getLoopLastItterationState(const Stmt *Loop, + ProgramStateRef PrevState, + const LocationContext *LCtx, + SValBuilder &svalBuilder, + ASTContext &Context) { + // Get loop condition + const Stmt *ConditionStmt; + switch (Loop->getStmtClass()) { + default: + return NULL; + case Stmt::ForStmtClass: + ConditionStmt = cast<ForStmt>(Loop)->getCond(); + break; + case Stmt::WhileStmtClass: + ConditionStmt = cast<WhileStmt>(Loop)->getCond(); + break; + case Stmt::DoStmtClass: + ConditionStmt = cast<DoStmt>(Loop)->getCond(); + break; + } + + if (const BinaryOperator *BOCond = dyn_cast<BinaryOperator>(ConditionStmt)) { + // Look for conditions with a single variable on one side and a constant + // expression on the other + BinaryOperator::Opcode opCode = BOCond->getOpcode(); + const VarDecl *LoopVariable; + const Expr *LoopVarExpr; + APSInt EdgeVal; + + bool Success = false; + if ((LoopVariable = getSingleVariable(BOCond->getLHS())) != NULL) { + LoopVarExpr = BOCond->getLHS(); + Success = BOCond->getRHS()->EvaluateAsInt(EdgeVal, Context); + } + else if ((LoopVariable = getSingleVariable(BOCond->getRHS())) != NULL) { + LoopVarExpr = BOCond->getRHS(); + Success = BOCond->getLHS()->EvaluateAsInt(EdgeVal, Context); + if (Success) { + // change the opcode so the following code can treat the + // condition as if the varaible is always on the LHS + switch (opCode) { + default: break; + case BO_LT: opCode = BO_GT; break; + case BO_GT: opCode = BO_LT; break; + case BO_LE: opCode = BO_GE; break; + case BO_GE: opCode = BO_LE; break; + } + } + } + if (!Success) return NULL; + + // Convert NE to LT or GT + if (opCode == BO_NE) { + // Try to get the current value of the loop variable + SVal CurSV = PrevState->getSVal(LoopVarExpr, LCtx); + APSInt CurVal; + if (auto NLI = CurSV.getAs<nonloc::ConcreteInt>()) { + CurVal = NLI->getValue(); + } + else if (auto LI = CurSV.getAs<loc::ConcreteInt>()) { + CurVal = LI->getValue(); + } + else { + return NULL; + } + // Check signedness and ensure matching bit widths + assert(CurVal.isSigned() == EdgeVal.isSigned() && + "Signedness missmatch"); + if (CurVal.getBitWidth() > EdgeVal.getBitWidth()) { + EdgeVal.extend(CurVal.getBitWidth()); + } + else if (CurVal.getBitWidth() < EdgeVal.getBitWidth()) { + CurVal.extend(EdgeVal.getBitWidth()); + } + // Compare with the edge value + assert(CurVal != EdgeVal && "Condition is already false"); + if (CurVal < EdgeVal) { + opCode = BO_LT; + } + else { + opCode = BO_GT; + } + } + + // Set the value to the approximate last value of the loop condition + // and check the condition is supported + switch (opCode) { + default: + return NULL; + case BO_LT: + --EdgeVal; + break; + case BO_GT: + ++EdgeVal; + break; + case BO_LE: + case BO_GE: + break; + } + + // Create the approximate state of the last itteration + Loc LVLoc = PrevState->getLValue(LoopVariable, LCtx); + nonloc::ConcreteInt EndSV = svalBuilder.makeIntVal(EdgeVal); + return PrevState->bindLoc(LVLoc, EndSV); + + } // Condition = BinaryOperator + return NULL; +} + void ExprEngine::processBranch(const Stmt *Condition, const Stmt *Term, NodeBuilderContext& BldCtx, ExplodedNode *Pred, @@ -1598,6 +1717,26 @@ ProgramStateRef StTrue, StFalse; std::tie(StTrue, StFalse) = PrevState->assume(V); + // Try to get approximate last iteration for constant bound loops + if (builder.isFeasible(true) && StTrue && + builder.isFeasible(false) && !StFalse && + BldCtx.blockCount() == AMgr.options.maxBlockVisitOnPath - 1) { + + ProgramStateRef StLast; + StLast = getLoopLastItterationState(Term, PrevState, + PredI->getLocationContext(), + svalBuilder, getContext()); + if (StLast) { + // Generate a new node for the true branch (which must exist) + StLast = StLast->assume(V, true); + assert(StLast && "presumed last itteration is not possible"); + builder.generateNode(StLast, true, PredI); + // Don't process the normal branches + currBldrCtx = nullptr; + return; + } + } + // Process the true branch. if (builder.isFeasible(true)) { if (StTrue)
_______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits