Szelethus created this revision. Szelethus added reviewers: NoQ, xazax.hun, martong, whisperity, rnkovacs, dcoughlin, baloghadamsoftware. Szelethus added a project: clang. Herald added subscribers: cfe-commits, DenisDvlp, steakhal, Charusso, gamesh411, dkrupp, donat.nagy, mikhail.ramalho, a.sidorin, szepet, mgorny.
The following revision adds the basic infrastructure for a reaching definitions algorithm for C++. Short description & motivation ============================== This is a dataflow algorithm designed to find a set of definitions (for example assignments) to variables in a given `CFGBlock`. To demonstrate, in the following code snippet: int flag; void foo(); void f() { int *x = nullptr; flag = 1; foo(); if (flag) x = new int; foo(); if (flag) *x = 5; } The CFG would look like this: -> [B3] -> -> [B1] -> / \ / \ [B5 (ENTRY)] -> [B4] ------> [B2] ---> [B0 (EXIT)] Should `foo()` change `flag`'s value on the first function call to `false`, then `true` on the second, a null pointer dereference error would occur. Using a reaching definitions calculator, we can retrieve that the set of ingoing definitions to B1 <https://reviews.llvm.org/B1> is `{(flag, [B2]), (x, [B3]), (x, [B4])}`. The set hints that `x` has a reaching definitions that would not have caused a nullpointer dereference. The algorithm ============= A reaching definition for a given instruction is an earlier instruction whose target variable can reach (be assigned to) the given one without an intervening assignment. The similarly named reaching definitions is a data-flow analysis which statically determines which definitions may reach a given point in the code [1]. The set of ingoing and outgoing reaching definitions are calculated for each basic block with set operations on GEN and KILL sets. This could be further fine-grained so the RD sets would be a property of a statement, rather then an entire `CFGBlock`, but I didn't want to delay the publishing of this patch any further, and I believe that such a change wouldn't hurt the infrastructure much. Implementation ============== As the formal definition would suggest, this was conceived for instructions, which is why even such terms as "variable" or "definition" can be hard to define for C++. For this reason, the patch spares a lot of LOC for documentation and reasoning. While the algorithm itself is simple (and is implemented quite literally from [1]), the problematic part of this is the generation of of GEN and KILL sets. I tried to introduce an infrastructure that can tolerate a lot of things I have inevitable forgotten (or left for followup patches) with the use of easy to add AST matchers. Immediate questions to address ============================== > We already have 2 dataflow algorithms implemented in the Analysis library: > UninitializedObject and LiveVariables. Reaching definitions doesn't sound all > too dissimilar. Why are we adding hundreds of LOC here? Can't we reuse some > of it? UninitializedObject and LiveVariables are practically the same algorithm with minimal differences. Despite this, they take up ~750 and ~1050 LOC respectively. Both of those algorithms can be expressed with the use of GEN and KILL sets, yet none of them are, and they duplicate a lot of logic. It isn't terribly obvious however how their logic could be (or really, should be) merged. Shockingly, we have no GEN and KILL set implementations in Clang. I think that is the main addition of this patch, even if it unfortunately duplicates some logic. The previously mentioned two analyses however could serve as an inspiration. > UninitializedObject and LiveVariables uses ASTVisitors rather than > ASTMatchers. The latter is more expensive. Why did we go with them? Matchers are more expressive. For instance, how would you find `s.a.x.z` with visitors? Its doable, but requires to keep track of a lot of state and would make the implementation ugly. I don't have complete confidence in my decision here, so I welcome alternative suggestions or counterarguments. > What are the main things to get done? In order: - Finalize the infrastructure for GEN set generation. - Make it possible to calculate RD sets for statements, not only blocks. - Improve the interface of `ReachingDefinitionsCalculator`. What we like to query? Most probably the set of ingoing RDs for a variable at a given statement. - Performance: use immutable data structures and a better CFG traversal strategy. Further reading =============== My GSoC project: https://szelethus.github.io/gsoc2019/ (this has a lot of pointers to related discussions in the 'Reaching Definitions Analysis' section) [cfe-dev] Dataflow analyses in Clang -- why do we not have GEN and KILL sets? http://lists.llvm.org/pipermail/cfe-dev/2020-March/064893.html [analyzer][WIP] Implement a primitive reaching definitions analysis D64991 <https://reviews.llvm.org/D64991> References ========== My main source was wikipedia: [1] https://en.wikipedia.org/wiki/Reaching_definition I read the following articles, but they didn't give me the information I needed: Tonella, Paolo, et al. "Variable precision reaching definitions analysis for software maintenance." Proceedings. First Euromicro Conference on Software Maintenance and Reengineering. IEEE, 1997. Collard, Jean-François, and Jens Knoop. "A comparative study of reaching-definitions analyses." (1998). Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D76287 Files: clang/include/clang/Analysis/Analyses/ReachingDefinitions.h clang/include/clang/StaticAnalyzer/Checkers/Checkers.td clang/lib/Analysis/CMakeLists.txt clang/lib/Analysis/ReachingDefinitions.cpp clang/lib/StaticAnalyzer/Checkers/DebugCheckers.cpp clang/test/Analysis/dump-definitions.cpp
Index: clang/test/Analysis/dump-definitions.cpp =================================================================== --- /dev/null +++ clang/test/Analysis/dump-definitions.cpp @@ -0,0 +1,153 @@ +// RUN: %clang_analyze_cc1 %s \ +// RUN: -analyzer-checker=debug.DumpCFG \ +// RUN: -analyzer-checker=debug.DumpGenSets \ +// RUN: -analyzer-checker=debug.DumpKillSets \ +// RUN: -analyzer-checker=debug.DumpReachingDefinitions \ +// RUN: 2>&1 | FileCheck %s + +int getInt(); +int *getIntPtr(); +bool coin(); + +void single_vardecl_in_declstmt() { + int *ptr = getIntPtr(); +} +// [B2 (ENTRY)] -> [B1] -> [B0 (EXIT)] + +// CHECK: GEN sets: blockid (varname [blockid, elementid]) +// CHECK-NEXT: 1 (ptr, [1, 3]) <write> +// CHECK-NEXT: KILL sets: blockid (varname [blockid, elementid]) +// CHECK-NEXT: Reaching definition sets: blockid IN/OUT (varname [blockid, elementid]) +// CHECK-NEXT: 0 IN (ptr, [1, 3]) <write> +// CHECK-NEXT: 0 OUT (ptr, [1, 3]) <write> +// CHECK-NEXT: 1 OUT (ptr, [1, 3]) <write> + +void multiple_vardecl_in_declstmt() { + int *ptr = getIntPtr(), i; +} +// [B2 (ENTRY)] -> [B1] -> [B0 (EXIT)] + +// CHECK: GEN sets: blockid (varname [blockid, elementid]) +// CHECK-NEXT: 1 (ptr, [1, 3]) <write> +// CHECK-NEXT: 1 (i, [1, 4]) <write> +// CHECK-NEXT: KILL sets: blockid (varname [blockid, elementid]) +// CHECK-NEXT: Reaching definition sets: blockid IN/OUT (varname [blockid, elementid]) +// CHECK-NEXT: 0 IN (ptr, [1, 3]) <write> +// CHECK-NEXT: 0 IN (i, [1, 4]) <write> +// CHECK-NEXT: 0 OUT (ptr, [1, 3]) <write> +// CHECK-NEXT: 0 OUT (i, [1, 4]) <write> +// CHECK-NEXT: 1 OUT (ptr, [1, 3]) <write> +// CHECK-NEXT: 1 OUT (i, [1, 4]) <write> + +void function_and_vardecl_in_declstmt() { + int *ptr = getIntPtr(), a(); +} +// [B2 (ENTRY)] -> [B1] -> [B0 (EXIT)] + +// CHECK: GEN sets: blockid (varname [blockid, elementid]) +// CHECK-NEXT: 1 (ptr, [1, 3]) <write> +// CHECK-NEXT: KILL sets: blockid (varname [blockid, elementid]) +// CHECK-NEXT: Reaching definition sets: blockid IN/OUT (varname [blockid, elementid]) +// CHECK-NEXT: 0 IN (ptr, [1, 3]) <write> +// CHECK-NEXT: 0 OUT (ptr, [1, 3]) <write> +// CHECK-NEXT: 1 OUT (ptr, [1, 3]) <write> + +void single_def_in_same_block() { + int *ptr = getIntPtr(); + + if (coin()) + ptr = 0; + + if (!ptr) + *ptr = 5; +} +// -> [B3] -> -> [B1] -> +// / \ / \ +// [B5 (ENTRY)] -> [B4] ------> [B2] ---> [B0 (EXIT)] + +// CHECK: GEN sets: blockid (varname [blockid, elementid]) +// CHECK-NEXT: 3 (ptr, [3, 3]) <write> +// CHECK-NEXT: 4 (ptr, [4, 3]) <write> +// CHECK-NEXT: KILL sets: blockid (varname [blockid, elementid]) +// CHECK-NEXT: 3 (ptr, [4, 3]) <write> +// CHECK-NEXT: 4 (ptr, [3, 3]) <write> +// CHECK-NEXT: Reaching definition sets: blockid IN/OUT (varname [blockid, elementid]) +// CHECK-NEXT: 0 IN (ptr, [3, 3]) <write> +// CHECK-NEXT: 0 IN (ptr, [4, 3]) <write> +// CHECK-NEXT: 0 OUT (ptr, [3, 3]) <write> +// CHECK-NEXT: 0 OUT (ptr, [4, 3]) <write> +// CHECK-NEXT: 1 IN (ptr, [3, 3]) <write> +// CHECK-NEXT: 1 IN (ptr, [4, 3]) <write> +// CHECK-NEXT: 1 OUT (ptr, [3, 3]) <write> +// CHECK-NEXT: 1 OUT (ptr, [4, 3]) <write> +// CHECK-NEXT: 2 IN (ptr, [3, 3]) <write> +// CHECK-NEXT: 2 IN (ptr, [4, 3]) <write> +// CHECK-NEXT: 2 OUT (ptr, [3, 3]) <write> +// CHECK-NEXT: 2 OUT (ptr, [4, 3]) <write> +// CHECK-NEXT: 3 IN (ptr, [4, 3]) <write> +// CHECK-NEXT: 3 OUT (ptr, [3, 3]) <write> +// CHECK-NEXT: 4 OUT (ptr, [4, 3]) <write> + +void different_assignments() { + int i = getInt(); + + if (coin()) + i = 0; + + i += 3; + + if (!coin()) + i -= 2; + + i *= 9; + + if (i = 0) + ; +} +// -> [B5] -> -> [B3] -> -> [B1] -> +// / \ / \ / \ +// [B7 (ENTRY)] -> [B6] ------> [B4] -------> [B2] ---> [B0 (EXIT)] + +// CHECK: GEN sets: blockid (varname [blockid, elementid]) +// CHECK-NEXT: 2 (i, [2, 5]) <write> +// CHECK-NEXT: 3 (i, [3, 2]) <write> +// CHECK-NEXT: 4 (i, [4, 2]) <write> +// CHECK-NEXT: 5 (i, [5, 2]) <write> +// CHECK-NEXT: 6 (i, [6, 3]) <write> +// CHECK-NEXT: KILL sets: blockid (varname [blockid, elementid]) +// CHECK-NEXT: 2 (i, [3, 2]) <write> +// CHECK-NEXT: 2 (i, [4, 2]) <write> +// CHECK-NEXT: 2 (i, [5, 2]) <write> +// CHECK-NEXT: 2 (i, [6, 3]) <write> +// CHECK-NEXT: 3 (i, [2, 5]) <write> +// CHECK-NEXT: 3 (i, [4, 2]) <write> +// CHECK-NEXT: 3 (i, [5, 2]) <write> +// CHECK-NEXT: 3 (i, [6, 3]) <write> +// CHECK-NEXT: 4 (i, [2, 5]) <write> +// CHECK-NEXT: 4 (i, [3, 2]) <write> +// CHECK-NEXT: 4 (i, [5, 2]) <write> +// CHECK-NEXT: 4 (i, [6, 3]) <write> +// CHECK-NEXT: 5 (i, [2, 5]) <write> +// CHECK-NEXT: 5 (i, [3, 2]) <write> +// CHECK-NEXT: 5 (i, [4, 2]) <write> +// CHECK-NEXT: 5 (i, [6, 3]) <write> +// CHECK-NEXT: 6 (i, [2, 5]) <write> +// CHECK-NEXT: 6 (i, [3, 2]) <write> +// CHECK-NEXT: 6 (i, [4, 2]) <write> +// CHECK-NEXT: 6 (i, [5, 2]) <write> +// CHECK-NEXT: Reaching definition sets: blockid IN/OUT (varname [blockid, elementid]) +// CHECK-NEXT: 0 IN (i, [2, 5]) <write> +// CHECK-NEXT: 0 OUT (i, [2, 5]) <write> +// CHECK-NEXT: 1 IN (i, [2, 5]) <write> +// CHECK-NEXT: 1 OUT (i, [2, 5]) <write> +// CHECK-NEXT: 2 IN (i, [3, 2]) <write> +// CHECK-NEXT: 2 IN (i, [4, 2]) <write> +// CHECK-NEXT: 2 OUT (i, [2, 5]) <write> +// CHECK-NEXT: 3 IN (i, [4, 2]) <write> +// CHECK-NEXT: 3 OUT (i, [3, 2]) <write> +// CHECK-NEXT: 4 IN (i, [5, 2]) <write> +// CHECK-NEXT: 4 IN (i, [6, 3]) <write> +// CHECK-NEXT: 4 OUT (i, [4, 2]) <write> +// CHECK-NEXT: 5 IN (i, [6, 3]) <write> +// CHECK-NEXT: 5 OUT (i, [5, 2]) <write> +// CHECK-NEXT: 6 OUT (i, [6, 3]) <write> Index: clang/lib/StaticAnalyzer/Checkers/DebugCheckers.cpp =================================================================== --- clang/lib/StaticAnalyzer/Checkers/DebugCheckers.cpp +++ clang/lib/StaticAnalyzer/Checkers/DebugCheckers.cpp @@ -10,12 +10,13 @@ // //===----------------------------------------------------------------------===// -#include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h" #include "clang/Analysis/Analyses/Dominators.h" #include "clang/Analysis/Analyses/LiveVariables.h" +#include "clang/Analysis/Analyses/ReachingDefinitions.h" #include "clang/Analysis/CallGraph.h" -#include "clang/StaticAnalyzer/Core/Checker.h" +#include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h" #include "clang/StaticAnalyzer/Core/BugReporter/BugType.h" +#include "clang/StaticAnalyzer/Core/Checker.h" #include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h" #include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h" @@ -102,6 +103,72 @@ return true; } +//===----------------------------------------------------------------------===// +// GenSetDumper +//===----------------------------------------------------------------------===// + +namespace { +class GenSetDumper : public Checker<check::ASTCodeBody> { +public: + void checkASTCodeBody(const Decl *D, AnalysisManager &mgr, + BugReporter &BR) const { + if (AnalysisDeclContext *AC = mgr.getAnalysisDeclContext(D)) + AC->getAnalysis<ReachingDefinitionsCalculator>()->dumpGenSet(); + } +}; +} // namespace + +void ento::registerGenSetDumper(CheckerManager &mgr) { + mgr.registerChecker<GenSetDumper>(); +} + +bool ento::shouldRegisterGenSetDumper(const LangOptions &LO) { return true; } + +//===----------------------------------------------------------------------===// +// KillSetDumper +//===----------------------------------------------------------------------===// + +namespace { +class KillSetDumper : public Checker<check::ASTCodeBody> { +public: + void checkASTCodeBody(const Decl *D, AnalysisManager &mgr, + BugReporter &BR) const { + if (AnalysisDeclContext *AC = mgr.getAnalysisDeclContext(D)) + AC->getAnalysis<ReachingDefinitionsCalculator>()->dumpKillSet(); + } +}; +} // namespace + +void ento::registerKillSetDumper(CheckerManager &mgr) { + mgr.registerChecker<KillSetDumper>(); +} + +bool ento::shouldRegisterKillSetDumper(const LangOptions &LO) { return true; } + +//===----------------------------------------------------------------------===// +// ReachingDefinitionsDumper +//===----------------------------------------------------------------------===// + +namespace { +class ReachingDefinitionsDumper : public Checker<check::ASTCodeBody> { +public: + void checkASTCodeBody(const Decl *D, AnalysisManager &mgr, + BugReporter &BR) const { + if (AnalysisDeclContext *AC = mgr.getAnalysisDeclContext(D)) + AC->getAnalysis<ReachingDefinitionsCalculator>() + ->dumpReachingDefinitions(); + } +}; +} // namespace + +void ento::registerReachingDefinitionsDumper(CheckerManager &mgr) { + mgr.registerChecker<ReachingDefinitionsDumper>(); +} + +bool ento::shouldRegisterReachingDefinitionsDumper(const LangOptions &LO) { + return true; +} + //===----------------------------------------------------------------------===// // LiveVariablesDumper //===----------------------------------------------------------------------===// Index: clang/lib/Analysis/ReachingDefinitions.cpp =================================================================== --- /dev/null +++ clang/lib/Analysis/ReachingDefinitions.cpp @@ -0,0 +1,319 @@ +//===--- ReachableDefinitions.cpp -------------------------------*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// Calculates reachable definitions for a variable. +// +//===----------------------------------------------------------------------===// + +#include "clang/Analysis/Analyses/ReachingDefinitions.h" +#include "clang/AST/Decl.h" +#include "clang/AST/DeclCXX.h" +#include "clang/AST/DeclLookups.h" +#include "clang/AST/Expr.h" +#include "clang/ASTMatchers/ASTMatchFinder.h" +#include "clang/ASTMatchers/ASTMatchers.h" +#include "clang/ASTMatchers/ASTMatchersInternal.h" +#include "clang/Analysis/CFG.h" +#include "clang/Basic/LLVM.h" +#include "clang/StaticAnalyzer/Core/PathSensitive/SVals.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/ADT/SetOperations.h" +#include "llvm/Support/ErrorHandling.h" +#include <memory> + +using namespace clang; +using namespace ReachingDefinitionsDetail; +using namespace ast_matchers; + +using DefinitionKind = Definition::DefinitionKind; + +//===----------------------------------------------------------------------===// +// Utility functions. +//===----------------------------------------------------------------------===// + +static bool killsVar(const Definition &Victim, const GenSet &Set) { + for (const Definition &Perpetrator : Set) + if (Victim.Var == Perpetrator.Var) + return true; + return false; +} + +static StringRef describeDefinitionKind(DefinitionKind K) { + switch (K) { + case Definition::Write: + return "write"; + } +} + +//===----------------------------------------------------------------------===// +// Methods of Definition. +//===----------------------------------------------------------------------===// + +void Definition::dump() const { + llvm::errs() << "(" << Var->getNameAsString(); + + llvm::errs() << ", [" << getCFGBlock()->getIndexInCFG() << ", " + << E.getIndexInBlock() << "])" + << " <" << (describeDefinitionKind(Kind)) << ">"; +} + +//===----------------------------------------------------------------------===// +// Matcher callbacks for constructing GEN sets for the variable finding stage. +//===----------------------------------------------------------------------===// + +class VariableCollectorCB : public GenSetMatcherCallback { +public: + VariableCollectorCB(GenSetBuilder &GSBuilder) + : GenSetMatcherCallback(GSBuilder) {} + + static internal::Matcher<Stmt> getMatcher() { + return stmt(forEachDescendant(declRefExpr().bind("var"))); + } + + virtual void run(const MatchFinder::MatchResult &Result) override { + const auto *E = Result.Nodes.getNodeAs<DeclRefExpr>("var"); + assert(E); + if (const VarDecl *Var = dyn_cast<VarDecl>(E->getDecl())) + GSBuilder.addVariable(Var); + } +}; + +//===----------------------------------------------------------------------===// +// Matcher callbacks for constructing GEN sets for the definition finding stage. +//===----------------------------------------------------------------------===// + +class AssignmentOperatorCB : public GenSetMatcherCallback { +public: + AssignmentOperatorCB(GenSetBuilder &GSBuilder) + : GenSetMatcherCallback(GSBuilder) {} + + static internal::Matcher<Stmt> getMatcher() { + return binaryOperator(isAssignmentOperator()).bind("assign"); + } + + virtual void run(const MatchFinder::MatchResult &Result) override { + const auto *BO = Result.Nodes.getNodeAs<BinaryOperator>("assign"); + assert(BO); + GSBuilder.handleExpr(BO->getLHS(), DefinitionKind::Write); + } +}; + +class DeclarationCB : public GenSetMatcherCallback { +public: + DeclarationCB(GenSetBuilder &GSBuilder) : GenSetMatcherCallback(GSBuilder) {} + + static internal::Matcher<Stmt> getMatcher() { + return declStmt().bind("decls"); + } + + virtual void run(const MatchFinder::MatchResult &Result) override { + const auto *DS = Result.Nodes.getNodeAs<DeclStmt>("decls"); + assert(DS); + + for (const Decl *D : DS->decls()) { + const auto *Var = dyn_cast<VarDecl>(D); + if (!Var) + continue; + + GSBuilder.insertToGenSet(Var, DefinitionKind::Write); + } + } +}; + +//===----------------------------------------------------------------------===// +// Matcher callbacks for constructing GEN sets for the expression finding stage. +//===----------------------------------------------------------------------===// + +class DeclRefExprCB : public GenSetMatcherCallback { +public: + DeclRefExprCB(GenSetBuilder &GSBuilder) : GenSetMatcherCallback(GSBuilder) {} + + static internal::Matcher<Stmt> getMatcher() { + return declRefExpr(to(varDecl().bind("var"))); + } + + virtual void run(const MatchFinder::MatchResult &Result) override { + const auto *Var = Result.Nodes.getNodeAs<VarDecl>("var"); + assert(Var); + GSBuilder.insertToGenSet(Var); + } +}; + +//===----------------------------------------------------------------------===// +// Methods of GenSetBuilder. +//===----------------------------------------------------------------------===// + +GenSetBuilder::GenSetBuilder(const Decl *D) + : D(D), Context(&D->getASTContext()) { + + // TODO: Should we match the entire TU for nested static variables? + addMatcher<&GenSetBuilder::VariableFinder, VariableCollectorCB>(); + + // TODO: Elvis operator (?:). + // TODO: C++ MemberExpr? + // TODO: ParenExpr? (((a))) = 3; + addMatcher<&GenSetBuilder::ExpressionFinder, DeclRefExprCB>(); + + // TODO: Destructor calls? Should we be THAT conservative? + // TODO: Regular function calls? + // TODO: Moving an object? + // TODO: Method calls? + // TODO: Analyzing a method? + // TODO: What do you do with Objective.*? + // TODO: Exceptions? + addMatcher<&GenSetBuilder::DefinitionFinder, AssignmentOperatorCB>(); + addMatcher<&GenSetBuilder::DefinitionFinder, DeclarationCB>(); + + // Collect all used variables. + if (const auto *FD = dyn_cast<FunctionDecl>(D)) { + VariableFinder.match(*FD, FD->getASTContext()); + for (const ParmVarDecl *Param : FD->parameters()) + addVariable(Param); + } + + // TODO: Collect all visible, non-local variables as well. +} + +void GenSetBuilder::getGenSetForCFGBlock(const CFGBlock *B, GenSet &Gen) { + if (B->empty()) + return; + + VariableFinder.match(*D, D->getASTContext()); + + CurrentGenSet = &Gen; + + for (CFGBlock::ConstCFGElementRef E : B->rrefs()) { + if (E->getKind() != CFGElement::Kind::Statement) + continue; + CurrentCFGElem = E; + + const Stmt *S = E->castAs<CFGStmt>().getStmt(); + assert(S); + DefinitionFinder.match(*S, D->getASTContext()); + } + + CurrentGenSet = nullptr; +} + +void GenSetBuilder::handleExpr(const Expr *E, DefinitionKind Kind) { + CurrentKind = Kind; + ExpressionFinder.match(*E, *Context); +} + +void GenSetBuilder::insertToGenSet(const VarDecl *Var, + DefinitionKind Kind) { + CurrentGenSet->emplace(Var, *CurrentCFGElem, Kind); +} + +void GenSetBuilder::addVariable(const VarDecl *Var) { + AllVariables.emplace(Var); +} + +//===----------------------------------------------------------------------===// +// Methods of ReachingDefinitionsCalculator. +//===----------------------------------------------------------------------===// + +ReachingDefinitionsCalculator::ReachingDefinitionsCalculator(const Decl *D, + const CFG *cfg) + : cfg(cfg), GSBuilder(D) { + + for (const CFGBlock *B : *cfg) + GSBuilder.getGenSetForCFGBlock(B, Gen[B]); + // TODO: We ignore parameters! We should collect them as definitions as well. + + calculate(); +} + +void ReachingDefinitionsCalculator::init() { + llvm::SmallVector<Definition, 16> AllDefinitions; + for (const std::pair<const CFGBlock *, GenSet> G : Gen) + for (const Definition &Def : G.second) + AllDefinitions.push_back(Def); + + for (const std::pair<const CFGBlock *, GenSet> G : Gen) + for (const Definition &Def : AllDefinitions) + if (G.first != Def.getCFGBlock() && killsVar(Def, G.second)) + Kill[G.first].insert(Def); +} + +using WorklistTy = llvm::SmallVector<const CFGBlock *, 5>; + +void ReachingDefinitionsCalculator::calculate() { + init(); + + // TODO: Use ForwardDataflowWorklist. + for (const std::pair<const CFGBlock *, GenSet> G : Gen) + Out[G.first] = {G.second.begin(), G.second.end()}; + + WorklistTy Worklist({cfg->begin(), cfg->end()}); + + while (!Worklist.empty()) { + const CFGBlock *N = Worklist.pop_back_val(); + + for (const CFGBlock *Pred : N->preds()) + llvm::set_union(In[N], Out[Pred]); + + bool HasChanged = + llvm::set_union(Out[N], llvm::set_difference(In[N], Kill[N])); + + if (llvm::set_union(Out[N], Gen[N])) + HasChanged = true; + + if (HasChanged) { + for (const CFGBlock *Succ : N->succs()) + Worklist.push_back(Succ); + } + } +} + +void ReachingDefinitionsCalculator::dumpGenSet() const { + llvm::errs() << "GEN sets: blockid (varname [blockid, elementid])\n"; + for (const std::pair<const CFGBlock *, GenSet> D : Gen) { + size_t BlockID = llvm::find(*cfg, D.first) - cfg->begin(); + for (const Definition &Def : D.second) { + llvm::errs() << BlockID << ' '; + Def.dump(); + llvm::errs() << '\n'; + } + } +} + +void ReachingDefinitionsCalculator::dumpKillSet() const { + llvm::errs() << "KILL sets: blockid (varname [blockid, elementid])\n"; + for (const std::pair<const CFGBlock *, DefinitionSet> D : Kill) { + size_t BlockID = llvm::find(*cfg, D.first) - cfg->begin(); + for (const Definition &Def : D.second) { + llvm::errs() << BlockID << ' '; + Def.dump(); + llvm::errs() << '\n'; + } + } +} + +void ReachingDefinitionsCalculator::dumpReachingDefinitions() const { + llvm::errs() << "Reaching definition sets: " + "blockid IN/OUT (varname [blockid, elementid])\n"; + for (const CFGBlock *B : *cfg) { + size_t BlockID = llvm::find(*cfg, B) - cfg->begin(); + if (In.count(B)) { + for (const Definition &Def : In.find(B)->second) { + llvm::errs() << BlockID << " IN "; + Def.dump(); + llvm::errs() << '\n'; + } + } + + if (Out.count(B)) { + for (const Definition &Def : Out.find(B)->second) { + llvm::errs() << BlockID << " OUT "; + Def.dump(); + llvm::errs() << '\n'; + } + } + } +} Index: clang/lib/Analysis/CMakeLists.txt =================================================================== --- clang/lib/Analysis/CMakeLists.txt +++ clang/lib/Analysis/CMakeLists.txt @@ -22,6 +22,7 @@ PostOrderCFGView.cpp ProgramPoint.cpp ReachableCode.cpp + ReachingDefinitions.cpp RetainSummaryManager.cpp ThreadSafety.cpp ThreadSafetyCommon.cpp Index: clang/include/clang/StaticAnalyzer/Checkers/Checkers.td =================================================================== --- clang/include/clang/StaticAnalyzer/Checkers/Checkers.td +++ clang/include/clang/StaticAnalyzer/Checkers/Checkers.td @@ -1415,8 +1415,19 @@ Dependencies<[DebugContainerModeling, IteratorModeling]>, Documentation<NotDocumented>; -} // end "debug" +def GenSetDumper : Checker<"DumpGenSets">, + HelpText<"Dump the GEN sets for each block in a function">, + Documentation<NotDocumented>; + +def KillSetDumper : Checker<"DumpKillSets">, + HelpText<"Dump the KILL sets for each block in a function">, + Documentation<NotDocumented>; +def ReachingDefinitionsDumper : Checker<"DumpReachingDefinitions">, + HelpText<"Dump the reaching definitions for each block in a function">, + Documentation<NotDocumented>; + +} // end "debug" //===----------------------------------------------------------------------===// // Clone Detection Index: clang/include/clang/Analysis/Analyses/ReachingDefinitions.h =================================================================== --- /dev/null +++ clang/include/clang/Analysis/Analyses/ReachingDefinitions.h @@ -0,0 +1,247 @@ +//===--- ReachingDefinitions.h ----------------------------------*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// Calculates reaching definitions for a CFG. +// +//===----------------------------------------------------------------------===// + +#ifndef LLVM_CLANG_ANALYSIS_ANALYSES_REACHABLE_DEFINITIONS_H +#define LLVM_CLANG_ANALYSIS_ANALYSES_REACHABLE_DEFINITIONS_H + +#include "clang/AST/Decl.h" +#include "clang/ASTMatchers/ASTMatchFinder.h" +#include "clang/Analysis/AnalysisDeclContext.h" +#include "clang/Analysis/CFG.h" +#include "llvm/ADT/SmallVector.h" +#include <set> + +namespace clang { + +//===----------------------------------------------------------------------===// +// Since reaching definitions was implemented for instructions, it isn't well +// thought out what a definition of a variable is in C/C++, nor what we really +// mean under the term variable. +//===----------------------------------------------------------------------===// + +namespace ReachingDefinitionsDetail { + +//===----------------------------------------------------------------------===// +// What a 'variable' is: +// We define a variable as a VarDecl. +// TODO: How about records? Add support for fields. +// TODO: How do we describe the implicit this parameter for methods? +//===----------------------------------------------------------------------===// + +struct Variable { + const VarDecl *Var; + + Variable(const VarDecl *Var) : Var(Var) {} +}; + +struct VariableLess { + // TODO: const ref? but what if we change this to immutablelist? + bool operator()(Variable Lhs, Variable Rhs) { + return Lhs.Var < Rhs.Var; + } +}; + +//===----------------------------------------------------------------------===// +// What a 'definition of a variable' is: +// Whatever statement that writes a 'variable', or may write a 'variable'. +// TODO: What about global variables on function calls? We should introduce +// invalidations as well. +//===----------------------------------------------------------------------===// + +class Definition : public Variable { +public: + enum DefinitionKind { Write }; + + CFGBlock::ConstCFGElementRef E; + DefinitionKind Kind; + +public: + + Definition(const VarDecl *Var, CFGBlock::ConstCFGElementRef E, + DefinitionKind Kind) + : Variable(Var), E(E), Kind(Kind) {} + + const CFGBlock *getCFGBlock() const { return E.getParent(); } + + // (varname [blockid, elementid]) (reason) + void dump() const; +}; + +struct VarAndCFGElementLess { + bool operator()(Definition Lhs, Definition Rhs) const { + return std::tie(Lhs.Var, Lhs.E) < + std::tie(Rhs.Var, Rhs.E); + } +}; + +/// A set of definitions sorted only by the variable, so that each basic block +/// may only emit a single definition for any single variable. +// TODO: We need to track more then a single definition to a variable for a +// block's GEN set. Say, the static analyzer manages to prove that a potential +// invalidation definition (like a function call) didn't write the variable, we +// need to retrieve the definitions up to that point in the block. +// TODO: This is ridiculously espensive, change to ImmutableSet. +using GenSet = std::set<Definition, VariableLess>; + +/// A set of definitions sorted by the variable and the location of the +/// definition. For KILL, IN and OUT sets this is correct, because a CFGBlock +/// may kill several definitions of the same variables from different locations. +using DefinitionSet = std::set<Definition, VarAndCFGElementLess>; + +//===----------------------------------------------------------------------===// +// Determining whether a statement modifies a variable is a challanging, and +// requires using expensive-to-create matchers, and an easily extensible +// interface, hence the use of a builder class outside of the main calculator. +//===----------------------------------------------------------------------===// + +class GenSetBuilder; + +class GenSetMatcherCallback : public ast_matchers::MatchFinder::MatchCallback { +protected: + GenSetBuilder &GSBuilder; + + GenSetMatcherCallback(GenSetBuilder &GSBuilder) : GSBuilder(GSBuilder) {} +}; + +/// Responsible for building the GEN sets for each basic block. +/// +/// Since pointer escapes or function calls in general require us to generate +/// definitions that are invalidations, we need to gather all variables relevant +/// for this analysis, like parameters, locals and globals. We refer to this +/// stage as 'variable finding': +/// * Collect all non-local, visible variables +/// * Collect all local variables within the function that is used +/// +/// The actual building of GEN sets has two stages: +/// 1. For each CFGStmt in the CFGBlock, look for a statement that may be a +/// definition of an expression. ('definition finding') +/// 2. Search expressions for variables recursively. ('expression finding') +class GenSetBuilder { + using DefinitionKind = Definition::DefinitionKind; + + // Fields that shouldn't change after the construction of the builder object. + + llvm::SmallVector<std::unique_ptr<GenSetMatcherCallback>, 8> Callbacks; + + ast_matchers::MatchFinder VariableFinder; + ast_matchers::MatchFinder DefinitionFinder; + ast_matchers::MatchFinder ExpressionFinder; + + const Decl *D; + ASTContext *Context = nullptr; + + // TODO: This does not contain non-local visible variables. + std::set<Variable, VariableLess> AllVariables; + + // Fields that are changed at and during each GEN set construction. + + GenSet *CurrentGenSet; + Optional<CFGBlock::ConstCFGElementRef> CurrentCFGElem; + + // This field is used for definition finder matchers to communicate with + // expression finder matchers what kind of a definition we need to note. This + // isn't too pretty, but the interaction in between matcher callbacks are + // inherently awkward. + DefinitionKind CurrentKind = Definition::Write; + +private: + // TODO: Make this public and allow custom matchers to be added? + template <ast_matchers::MatchFinder GenSetBuilder::*Finder, + class GenSetMatcherCallbackTy> + void addMatcher() { + Callbacks.emplace_back(std::make_unique<GenSetMatcherCallbackTy>(*this)); + (this->*Finder) + .addMatcher(GenSetMatcherCallbackTy::getMatcher(), + Callbacks.back().get()); + } + +public: + GenSetBuilder(const Decl *D); + + //===--------------------------------------------------------------------===// + // Methods for retrieving a GEN set. + //===--------------------------------------------------------------------===// + + void getGenSetForCFGBlock(const CFGBlock *B, GenSet &Gen); + + //===--------------------------------------------------------------------===// + // Utility methods for building a GEN set. These are public because the + // callback objects will need to call them. + //===--------------------------------------------------------------------===// + + /// When we find a definition to an *expression*, we need to see if that + /// expression is a variable, or some other expression that needs further + /// processing, like in this case: + /// (a, b) = 10; + void handleExpr(const Expr *E, DefinitionKind Kind); + + /// Insert a new defintion of a variable into the current GEN set. + void insertToGenSet(const VarDecl *Var, DefinitionKind Kind); + + void insertToGenSet(const VarDecl *Var) { insertToGenSet(Var, CurrentKind); } + + /// Called during the variable finding stage. + void addVariable(const VarDecl *Var); +}; + +} // end of namespace ReachingDefinitionsDetail + +/// Calculates reaching definitions for each basic block. This calculation +/// doesn't try to argue about aliasing, meaning that some definitions are +/// definite write (we know that the variable it written), and some are a result +/// of invalidation, like passing a variable as a non-const reference to a +/// function. +class ReachingDefinitionsCalculator : public ManagedAnalysis { + using GenSetBuilder = ReachingDefinitionsDetail::GenSetBuilder; + +public: + using GenSet = ReachingDefinitionsDetail::GenSet; + using DefinitionKind = ReachingDefinitionsDetail::Definition::DefinitionKind; + using DefinitionSet = ReachingDefinitionsDetail::DefinitionSet; + + void calculate(); + + void dumpKillSet() const; + void dumpGenSet() const; + void dumpReachingDefinitions() const; + + static ReachingDefinitionsCalculator *create(AnalysisDeclContext &Ctx) { + return new ReachingDefinitionsCalculator(Ctx.getDecl(), Ctx.getCFG()); + } + + static const void *getTag() { + static int x; + return &x; + } + +private: + ReachingDefinitionsCalculator(const Decl *D, const CFG *cfg); + + void init(); + + const CFG *cfg; + GenSetBuilder GSBuilder; + + // TODO: Make the GEN set public to allow clients to remove definitions, or + // possibly mark an invalidation as write. + // TODO: Should we change to ImmutableMap? Does this matter that much? + std::map<const CFGBlock *, GenSet> Gen; + std::map<const CFGBlock *, DefinitionSet> Kill; + +public: + std::map<const CFGBlock *, DefinitionSet> In; + std::map<const CFGBlock *, DefinitionSet> Out; +}; + +} // end of namespace clang + +#endif // LLVM_CLANG_ANALYSIS_ANALYSES_REACHABLE_DEFINITIONS_H
_______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits