Szelethus created this revision.
Szelethus added reviewers: NoQ, xazax.hun, martong, whisperity, rnkovacs, 
dcoughlin, baloghadamsoftware.
Szelethus added a project: clang.
Herald added subscribers: cfe-commits, DenisDvlp, steakhal, Charusso, 
gamesh411, dkrupp, donat.nagy, mikhail.ramalho, a.sidorin, szepet, mgorny.

The following revision adds the basic infrastructure for a reaching definitions 
algorithm for C++.

Short description & motivation
==============================

This is a dataflow algorithm designed to find a set of definitions (for example 
assignments) to variables in a given `CFGBlock`. To demonstrate, in the 
following code snippet:

  int flag;
  
  void foo();
  
  void f() {
    int *x = nullptr;
    flag = 1;
  
    foo();
    if (flag)
      x = new int;
  
    foo();
    if (flag)
      *x = 5;
  }

The CFG would look like this:

  
                     -> [B3] ->    -> [B1] ->
                    /          \  /          \
  [B5 (ENTRY)] -> [B4] ------> [B2] ---> [B0 (EXIT)]

Should `foo()` change `flag`'s value on the first function call to `false`, 
then `true` on the second, a null pointer dereference error would occur. Using 
a reaching definitions calculator, we can retrieve that the set of ingoing 
definitions to B1 <https://reviews.llvm.org/B1> is `{(flag, [B2]), (x, [B3]), 
(x, [B4])}`. The set hints that `x` has a reaching definitions that would not 
have caused a nullpointer dereference.

The algorithm
=============

A reaching definition for a given instruction is an earlier instruction whose 
target variable can reach (be assigned to) the given one without an intervening 
assignment. The similarly named reaching definitions is a data-flow analysis 
which statically determines which definitions may reach a given point in the 
code [1].

The set of ingoing and outgoing reaching definitions are calculated for each 
basic block with set operations on GEN and KILL sets. This could be further 
fine-grained so the RD sets would be a property of a statement, rather then an 
entire `CFGBlock`, but I didn't want to delay the publishing of this patch any 
further, and I believe that such a change wouldn't hurt the infrastructure much.

Implementation
==============

As the formal definition would suggest, this was conceived for instructions, 
which is why even such terms as "variable" or "definition" can be hard to 
define for C++. For this reason, the patch spares a lot of LOC for 
documentation and reasoning. While the algorithm itself is simple (and is 
implemented quite literally from [1]), the problematic part of this is the 
generation of of GEN and KILL sets. I tried to introduce an infrastructure that 
can tolerate a lot of things I have inevitable forgotten (or left for followup 
patches) with the use of easy to add AST matchers.

Immediate questions to address
==============================

> We already have 2 dataflow algorithms implemented in the Analysis library: 
> UninitializedObject and LiveVariables. Reaching definitions doesn't sound all 
> too dissimilar. Why are we adding hundreds of LOC here? Can't we reuse some 
> of it?

UninitializedObject and LiveVariables are practically the same algorithm with 
minimal differences. Despite this, they take up ~750 and ~1050 LOC 
respectively. Both of those algorithms can be expressed with the use of GEN and 
KILL sets, yet none of them are, and they duplicate a lot of logic. It isn't 
terribly obvious however how their logic could be (or really, should be) merged.

Shockingly, we have no GEN and KILL set implementations in Clang. I think that 
is the main addition of this patch, even if it unfortunately duplicates some 
logic. The previously mentioned two analyses however could serve as an 
inspiration.

> UninitializedObject and LiveVariables uses ASTVisitors rather than 
> ASTMatchers. The latter is more expensive. Why did we go with them?

Matchers are more expressive. For instance, how would you find `s.a.x.z` with 
visitors? Its doable, but requires to keep track of a lot of state and would 
make the implementation ugly. I don't have complete confidence in my decision 
here, so I welcome alternative suggestions or counterarguments.

> What are the main things to get done?

In order:

- Finalize the infrastructure for GEN set generation.
- Make it possible to calculate RD sets for statements, not only blocks.
- Improve the interface of `ReachingDefinitionsCalculator`. What we like to 
query? Most probably the set of ingoing RDs for a variable at a given statement.
- Performance: use immutable data structures and a better CFG traversal 
strategy.

Further reading
===============

My GSoC project: https://szelethus.github.io/gsoc2019/ (this has a lot of 
pointers to related discussions in the 'Reaching Definitions Analysis' section)

[cfe-dev] Dataflow analyses in Clang -- why do we not have GEN and      KILL 
sets? http://lists.llvm.org/pipermail/cfe-dev/2020-March/064893.html

[analyzer][WIP] Implement a primitive reaching definitions analysis D64991 
<https://reviews.llvm.org/D64991>

References
==========

My main source was wikipedia:
[1] https://en.wikipedia.org/wiki/Reaching_definition

I read the following articles, but they didn't give me the information I needed:

Tonella, Paolo, et al. "Variable precision reaching definitions analysis for 
software maintenance." Proceedings. First Euromicro Conference on Software 
Maintenance and Reengineering. IEEE, 1997.

Collard, Jean-François, and Jens Knoop. "A comparative study of 
reaching-definitions analyses." (1998).


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D76287

Files:
  clang/include/clang/Analysis/Analyses/ReachingDefinitions.h
  clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
  clang/lib/Analysis/CMakeLists.txt
  clang/lib/Analysis/ReachingDefinitions.cpp
  clang/lib/StaticAnalyzer/Checkers/DebugCheckers.cpp
  clang/test/Analysis/dump-definitions.cpp

Index: clang/test/Analysis/dump-definitions.cpp
===================================================================
--- /dev/null
+++ clang/test/Analysis/dump-definitions.cpp
@@ -0,0 +1,153 @@
+// RUN: %clang_analyze_cc1 %s \
+// RUN:   -analyzer-checker=debug.DumpCFG \
+// RUN:   -analyzer-checker=debug.DumpGenSets \
+// RUN:   -analyzer-checker=debug.DumpKillSets \
+// RUN:   -analyzer-checker=debug.DumpReachingDefinitions \
+// RUN:   2>&1 | FileCheck %s
+
+int getInt();
+int *getIntPtr();
+bool coin();
+
+void single_vardecl_in_declstmt() {
+  int *ptr = getIntPtr();
+}
+// [B2 (ENTRY)] -> [B1] -> [B0 (EXIT)]
+
+// CHECK:      GEN sets: blockid (varname [blockid, elementid])
+// CHECK-NEXT: 1 (ptr, [1, 3]) <write>
+// CHECK-NEXT: KILL sets: blockid (varname [blockid, elementid])
+// CHECK-NEXT: Reaching definition sets: blockid IN/OUT (varname [blockid, elementid])
+// CHECK-NEXT: 0 IN (ptr, [1, 3]) <write>
+// CHECK-NEXT: 0 OUT (ptr, [1, 3]) <write>
+// CHECK-NEXT: 1 OUT (ptr, [1, 3]) <write>
+
+void multiple_vardecl_in_declstmt() {
+  int *ptr = getIntPtr(), i;
+}
+// [B2 (ENTRY)] -> [B1] -> [B0 (EXIT)]
+
+// CHECK:      GEN sets: blockid (varname [blockid, elementid])
+// CHECK-NEXT: 1 (ptr, [1, 3]) <write>
+// CHECK-NEXT: 1 (i, [1, 4]) <write>
+// CHECK-NEXT: KILL sets: blockid (varname [blockid, elementid])
+// CHECK-NEXT: Reaching definition sets: blockid IN/OUT (varname [blockid, elementid])
+// CHECK-NEXT: 0 IN (ptr, [1, 3]) <write>
+// CHECK-NEXT: 0 IN (i, [1, 4]) <write>
+// CHECK-NEXT: 0 OUT (ptr, [1, 3]) <write>
+// CHECK-NEXT: 0 OUT (i, [1, 4]) <write>
+// CHECK-NEXT: 1 OUT (ptr, [1, 3]) <write>
+// CHECK-NEXT: 1 OUT (i, [1, 4]) <write>
+
+void function_and_vardecl_in_declstmt() {
+  int *ptr = getIntPtr(), a();
+}
+// [B2 (ENTRY)] -> [B1] -> [B0 (EXIT)]
+
+// CHECK:      GEN sets: blockid (varname [blockid, elementid])
+// CHECK-NEXT: 1 (ptr, [1, 3]) <write>
+// CHECK-NEXT: KILL sets: blockid (varname [blockid, elementid])
+// CHECK-NEXT: Reaching definition sets: blockid IN/OUT (varname [blockid, elementid])
+// CHECK-NEXT: 0 IN (ptr, [1, 3]) <write>
+// CHECK-NEXT: 0 OUT (ptr, [1, 3]) <write>
+// CHECK-NEXT: 1 OUT (ptr, [1, 3]) <write>
+
+void single_def_in_same_block() {
+  int *ptr = getIntPtr();
+
+  if (coin())
+    ptr = 0;
+
+  if (!ptr)
+    *ptr = 5;
+}
+//                    -> [B3] ->    -> [B1] ->
+//                   /          \  /          \
+// [B5 (ENTRY)] -> [B4] ------> [B2] ---> [B0 (EXIT)]
+
+// CHECK:      GEN sets: blockid (varname [blockid, elementid])
+// CHECK-NEXT: 3 (ptr, [3, 3]) <write>
+// CHECK-NEXT: 4 (ptr, [4, 3]) <write>
+// CHECK-NEXT: KILL sets: blockid (varname [blockid, elementid])
+// CHECK-NEXT: 3 (ptr, [4, 3]) <write>
+// CHECK-NEXT: 4 (ptr, [3, 3]) <write>
+// CHECK-NEXT: Reaching definition sets: blockid IN/OUT (varname [blockid, elementid])
+// CHECK-NEXT: 0 IN (ptr, [3, 3]) <write>
+// CHECK-NEXT: 0 IN (ptr, [4, 3]) <write>
+// CHECK-NEXT: 0 OUT (ptr, [3, 3]) <write>
+// CHECK-NEXT: 0 OUT (ptr, [4, 3]) <write>
+// CHECK-NEXT: 1 IN (ptr, [3, 3]) <write>
+// CHECK-NEXT: 1 IN (ptr, [4, 3]) <write>
+// CHECK-NEXT: 1 OUT (ptr, [3, 3]) <write>
+// CHECK-NEXT: 1 OUT (ptr, [4, 3]) <write>
+// CHECK-NEXT: 2 IN (ptr, [3, 3]) <write>
+// CHECK-NEXT: 2 IN (ptr, [4, 3]) <write>
+// CHECK-NEXT: 2 OUT (ptr, [3, 3]) <write>
+// CHECK-NEXT: 2 OUT (ptr, [4, 3]) <write>
+// CHECK-NEXT: 3 IN (ptr, [4, 3]) <write>
+// CHECK-NEXT: 3 OUT (ptr, [3, 3]) <write>
+// CHECK-NEXT: 4 OUT (ptr, [4, 3]) <write>
+
+void different_assignments() {
+  int i = getInt();
+
+  if (coin())
+    i = 0;
+
+  i += 3;
+
+  if (!coin())
+    i -= 2;
+
+  i *= 9;
+
+  if (i = 0)
+    ;
+}
+//                    -> [B5] ->    -> [B3] ->    -> [B1] ->
+//                   /          \  /          \  /          \
+// [B7 (ENTRY)] -> [B6] ------> [B4] -------> [B2] ---> [B0 (EXIT)]
+
+// CHECK:      GEN sets: blockid (varname [blockid, elementid])
+// CHECK-NEXT: 2 (i, [2, 5]) <write>
+// CHECK-NEXT: 3 (i, [3, 2]) <write>
+// CHECK-NEXT: 4 (i, [4, 2]) <write>
+// CHECK-NEXT: 5 (i, [5, 2]) <write>
+// CHECK-NEXT: 6 (i, [6, 3]) <write>
+// CHECK-NEXT: KILL sets: blockid (varname [blockid, elementid])
+// CHECK-NEXT: 2 (i, [3, 2]) <write>
+// CHECK-NEXT: 2 (i, [4, 2]) <write>
+// CHECK-NEXT: 2 (i, [5, 2]) <write>
+// CHECK-NEXT: 2 (i, [6, 3]) <write>
+// CHECK-NEXT: 3 (i, [2, 5]) <write>
+// CHECK-NEXT: 3 (i, [4, 2]) <write>
+// CHECK-NEXT: 3 (i, [5, 2]) <write>
+// CHECK-NEXT: 3 (i, [6, 3]) <write>
+// CHECK-NEXT: 4 (i, [2, 5]) <write>
+// CHECK-NEXT: 4 (i, [3, 2]) <write>
+// CHECK-NEXT: 4 (i, [5, 2]) <write>
+// CHECK-NEXT: 4 (i, [6, 3]) <write>
+// CHECK-NEXT: 5 (i, [2, 5]) <write>
+// CHECK-NEXT: 5 (i, [3, 2]) <write>
+// CHECK-NEXT: 5 (i, [4, 2]) <write>
+// CHECK-NEXT: 5 (i, [6, 3]) <write>
+// CHECK-NEXT: 6 (i, [2, 5]) <write>
+// CHECK-NEXT: 6 (i, [3, 2]) <write>
+// CHECK-NEXT: 6 (i, [4, 2]) <write>
+// CHECK-NEXT: 6 (i, [5, 2]) <write>
+// CHECK-NEXT: Reaching definition sets: blockid IN/OUT (varname [blockid, elementid])
+// CHECK-NEXT: 0 IN (i, [2, 5]) <write>
+// CHECK-NEXT: 0 OUT (i, [2, 5]) <write>
+// CHECK-NEXT: 1 IN (i, [2, 5]) <write>
+// CHECK-NEXT: 1 OUT (i, [2, 5]) <write>
+// CHECK-NEXT: 2 IN (i, [3, 2]) <write>
+// CHECK-NEXT: 2 IN (i, [4, 2]) <write>
+// CHECK-NEXT: 2 OUT (i, [2, 5]) <write>
+// CHECK-NEXT: 3 IN (i, [4, 2]) <write>
+// CHECK-NEXT: 3 OUT (i, [3, 2]) <write>
+// CHECK-NEXT: 4 IN (i, [5, 2]) <write>
+// CHECK-NEXT: 4 IN (i, [6, 3]) <write>
+// CHECK-NEXT: 4 OUT (i, [4, 2]) <write>
+// CHECK-NEXT: 5 IN (i, [6, 3]) <write>
+// CHECK-NEXT: 5 OUT (i, [5, 2]) <write>
+// CHECK-NEXT: 6 OUT (i, [6, 3]) <write>
Index: clang/lib/StaticAnalyzer/Checkers/DebugCheckers.cpp
===================================================================
--- clang/lib/StaticAnalyzer/Checkers/DebugCheckers.cpp
+++ clang/lib/StaticAnalyzer/Checkers/DebugCheckers.cpp
@@ -10,12 +10,13 @@
 //
 //===----------------------------------------------------------------------===//
 
-#include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h"
 #include "clang/Analysis/Analyses/Dominators.h"
 #include "clang/Analysis/Analyses/LiveVariables.h"
+#include "clang/Analysis/Analyses/ReachingDefinitions.h"
 #include "clang/Analysis/CallGraph.h"
-#include "clang/StaticAnalyzer/Core/Checker.h"
+#include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h"
 #include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"
+#include "clang/StaticAnalyzer/Core/Checker.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h"
@@ -102,6 +103,72 @@
   return true;
 }
 
+//===----------------------------------------------------------------------===//
+// GenSetDumper
+//===----------------------------------------------------------------------===//
+
+namespace {
+class GenSetDumper : public Checker<check::ASTCodeBody> {
+public:
+  void checkASTCodeBody(const Decl *D, AnalysisManager &mgr,
+                        BugReporter &BR) const {
+    if (AnalysisDeclContext *AC = mgr.getAnalysisDeclContext(D))
+      AC->getAnalysis<ReachingDefinitionsCalculator>()->dumpGenSet();
+  }
+};
+} // namespace
+
+void ento::registerGenSetDumper(CheckerManager &mgr) {
+  mgr.registerChecker<GenSetDumper>();
+}
+
+bool ento::shouldRegisterGenSetDumper(const LangOptions &LO) { return true; }
+
+//===----------------------------------------------------------------------===//
+// KillSetDumper
+//===----------------------------------------------------------------------===//
+
+namespace {
+class KillSetDumper : public Checker<check::ASTCodeBody> {
+public:
+  void checkASTCodeBody(const Decl *D, AnalysisManager &mgr,
+                        BugReporter &BR) const {
+    if (AnalysisDeclContext *AC = mgr.getAnalysisDeclContext(D))
+      AC->getAnalysis<ReachingDefinitionsCalculator>()->dumpKillSet();
+  }
+};
+} // namespace
+
+void ento::registerKillSetDumper(CheckerManager &mgr) {
+  mgr.registerChecker<KillSetDumper>();
+}
+
+bool ento::shouldRegisterKillSetDumper(const LangOptions &LO) { return true; }
+
+//===----------------------------------------------------------------------===//
+// ReachingDefinitionsDumper
+//===----------------------------------------------------------------------===//
+
+namespace {
+class ReachingDefinitionsDumper : public Checker<check::ASTCodeBody> {
+public:
+  void checkASTCodeBody(const Decl *D, AnalysisManager &mgr,
+                        BugReporter &BR) const {
+    if (AnalysisDeclContext *AC = mgr.getAnalysisDeclContext(D))
+      AC->getAnalysis<ReachingDefinitionsCalculator>()
+          ->dumpReachingDefinitions();
+  }
+};
+} // namespace
+
+void ento::registerReachingDefinitionsDumper(CheckerManager &mgr) {
+  mgr.registerChecker<ReachingDefinitionsDumper>();
+}
+
+bool ento::shouldRegisterReachingDefinitionsDumper(const LangOptions &LO) {
+  return true;
+}
+
 //===----------------------------------------------------------------------===//
 // LiveVariablesDumper
 //===----------------------------------------------------------------------===//
Index: clang/lib/Analysis/ReachingDefinitions.cpp
===================================================================
--- /dev/null
+++ clang/lib/Analysis/ReachingDefinitions.cpp
@@ -0,0 +1,319 @@
+//===--- ReachableDefinitions.cpp -------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Calculates reachable definitions for a variable.
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/Analyses/ReachingDefinitions.h"
+#include "clang/AST/Decl.h"
+#include "clang/AST/DeclCXX.h"
+#include "clang/AST/DeclLookups.h"
+#include "clang/AST/Expr.h"
+#include "clang/ASTMatchers/ASTMatchFinder.h"
+#include "clang/ASTMatchers/ASTMatchers.h"
+#include "clang/ASTMatchers/ASTMatchersInternal.h"
+#include "clang/Analysis/CFG.h"
+#include "clang/Basic/LLVM.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/SVals.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/SetOperations.h"
+#include "llvm/Support/ErrorHandling.h"
+#include <memory>
+
+using namespace clang;
+using namespace ReachingDefinitionsDetail;
+using namespace ast_matchers;
+
+using DefinitionKind = Definition::DefinitionKind;
+
+//===----------------------------------------------------------------------===//
+// Utility functions.
+//===----------------------------------------------------------------------===//
+
+static bool killsVar(const Definition &Victim, const GenSet &Set) {
+  for (const Definition &Perpetrator : Set)
+    if (Victim.Var == Perpetrator.Var)
+      return true;
+  return false;
+}
+
+static StringRef describeDefinitionKind(DefinitionKind K) {
+  switch (K) {
+  case Definition::Write:
+    return "write";
+  }
+}
+
+//===----------------------------------------------------------------------===//
+// Methods of Definition.
+//===----------------------------------------------------------------------===//
+
+void Definition::dump() const {
+  llvm::errs() << "(" << Var->getNameAsString();
+
+  llvm::errs() << ", [" << getCFGBlock()->getIndexInCFG() << ", "
+               << E.getIndexInBlock() << "])"
+               << " <" << (describeDefinitionKind(Kind)) << ">";
+}
+
+//===----------------------------------------------------------------------===//
+// Matcher callbacks for constructing GEN sets for the variable finding stage.
+//===----------------------------------------------------------------------===//
+
+class VariableCollectorCB : public GenSetMatcherCallback {
+public:
+  VariableCollectorCB(GenSetBuilder &GSBuilder)
+      : GenSetMatcherCallback(GSBuilder) {}
+
+  static internal::Matcher<Stmt> getMatcher() {
+    return stmt(forEachDescendant(declRefExpr().bind("var")));
+  }
+
+  virtual void run(const MatchFinder::MatchResult &Result) override {
+    const auto *E = Result.Nodes.getNodeAs<DeclRefExpr>("var");
+    assert(E);
+    if (const VarDecl *Var = dyn_cast<VarDecl>(E->getDecl()))
+      GSBuilder.addVariable(Var);
+  }
+};
+
+//===----------------------------------------------------------------------===//
+// Matcher callbacks for constructing GEN sets for the definition finding stage.
+//===----------------------------------------------------------------------===//
+
+class AssignmentOperatorCB : public GenSetMatcherCallback {
+public:
+  AssignmentOperatorCB(GenSetBuilder &GSBuilder)
+      : GenSetMatcherCallback(GSBuilder) {}
+
+  static internal::Matcher<Stmt> getMatcher() {
+    return binaryOperator(isAssignmentOperator()).bind("assign");
+  }
+
+  virtual void run(const MatchFinder::MatchResult &Result) override {
+    const auto *BO = Result.Nodes.getNodeAs<BinaryOperator>("assign");
+    assert(BO);
+    GSBuilder.handleExpr(BO->getLHS(), DefinitionKind::Write);
+  }
+};
+
+class DeclarationCB : public GenSetMatcherCallback {
+public:
+  DeclarationCB(GenSetBuilder &GSBuilder) : GenSetMatcherCallback(GSBuilder) {}
+
+  static internal::Matcher<Stmt> getMatcher() {
+    return declStmt().bind("decls");
+  }
+
+  virtual void run(const MatchFinder::MatchResult &Result) override {
+    const auto *DS = Result.Nodes.getNodeAs<DeclStmt>("decls");
+    assert(DS);
+
+    for (const Decl *D : DS->decls()) {
+      const auto *Var = dyn_cast<VarDecl>(D);
+      if (!Var)
+        continue;
+
+      GSBuilder.insertToGenSet(Var, DefinitionKind::Write);
+    }
+  }
+};
+
+//===----------------------------------------------------------------------===//
+// Matcher callbacks for constructing GEN sets for the expression finding stage.
+//===----------------------------------------------------------------------===//
+
+class DeclRefExprCB : public GenSetMatcherCallback {
+public:
+  DeclRefExprCB(GenSetBuilder &GSBuilder) : GenSetMatcherCallback(GSBuilder) {}
+
+  static internal::Matcher<Stmt> getMatcher() {
+    return declRefExpr(to(varDecl().bind("var")));
+  }
+
+  virtual void run(const MatchFinder::MatchResult &Result) override {
+    const auto *Var = Result.Nodes.getNodeAs<VarDecl>("var");
+    assert(Var);
+    GSBuilder.insertToGenSet(Var);
+  }
+};
+
+//===----------------------------------------------------------------------===//
+// Methods of GenSetBuilder.
+//===----------------------------------------------------------------------===//
+
+GenSetBuilder::GenSetBuilder(const Decl *D)
+    : D(D), Context(&D->getASTContext()) {
+
+  // TODO: Should we match the entire TU for nested static variables?
+  addMatcher<&GenSetBuilder::VariableFinder, VariableCollectorCB>();
+
+  // TODO: Elvis operator (?:).
+  // TODO: C++ MemberExpr?
+  // TODO: ParenExpr? (((a))) = 3;
+  addMatcher<&GenSetBuilder::ExpressionFinder, DeclRefExprCB>();
+
+  // TODO: Destructor calls? Should we be THAT conservative?
+  // TODO: Regular function calls?
+  // TODO: Moving an object?
+  // TODO: Method calls?
+  // TODO: Analyzing a method?
+  // TODO: What do you do with Objective.*?
+  // TODO: Exceptions?
+  addMatcher<&GenSetBuilder::DefinitionFinder, AssignmentOperatorCB>();
+  addMatcher<&GenSetBuilder::DefinitionFinder, DeclarationCB>();
+
+  // Collect all used variables.
+  if (const auto *FD = dyn_cast<FunctionDecl>(D)) {
+    VariableFinder.match(*FD, FD->getASTContext());
+    for (const ParmVarDecl *Param : FD->parameters())
+      addVariable(Param);
+  }
+
+  // TODO: Collect all visible, non-local variables as well.
+}
+
+void GenSetBuilder::getGenSetForCFGBlock(const CFGBlock *B, GenSet &Gen) {
+  if (B->empty())
+    return;
+
+  VariableFinder.match(*D, D->getASTContext());
+
+  CurrentGenSet = &Gen;
+
+  for (CFGBlock::ConstCFGElementRef E : B->rrefs()) {
+    if (E->getKind() != CFGElement::Kind::Statement)
+      continue;
+    CurrentCFGElem = E;
+
+    const Stmt *S = E->castAs<CFGStmt>().getStmt();
+    assert(S);
+    DefinitionFinder.match(*S, D->getASTContext());
+  }
+
+  CurrentGenSet = nullptr;
+}
+
+void GenSetBuilder::handleExpr(const Expr *E, DefinitionKind Kind) {
+  CurrentKind = Kind;
+  ExpressionFinder.match(*E, *Context);
+}
+
+void GenSetBuilder::insertToGenSet(const VarDecl *Var,
+                                   DefinitionKind Kind) {
+  CurrentGenSet->emplace(Var, *CurrentCFGElem, Kind);
+}
+
+void GenSetBuilder::addVariable(const VarDecl *Var) {
+  AllVariables.emplace(Var);
+}
+
+//===----------------------------------------------------------------------===//
+// Methods of ReachingDefinitionsCalculator.
+//===----------------------------------------------------------------------===//
+
+ReachingDefinitionsCalculator::ReachingDefinitionsCalculator(const Decl *D,
+                                                             const CFG *cfg)
+    : cfg(cfg), GSBuilder(D) {
+
+  for (const CFGBlock *B : *cfg)
+    GSBuilder.getGenSetForCFGBlock(B, Gen[B]);
+  // TODO: We ignore parameters! We should collect them as definitions as well.
+
+  calculate();
+}
+
+void ReachingDefinitionsCalculator::init() {
+  llvm::SmallVector<Definition, 16> AllDefinitions;
+  for (const std::pair<const CFGBlock *, GenSet> G : Gen)
+    for (const Definition &Def : G.second)
+      AllDefinitions.push_back(Def);
+
+  for (const std::pair<const CFGBlock *, GenSet> G : Gen)
+    for (const Definition &Def : AllDefinitions)
+      if (G.first != Def.getCFGBlock() && killsVar(Def, G.second))
+        Kill[G.first].insert(Def);
+}
+
+using WorklistTy = llvm::SmallVector<const CFGBlock *, 5>;
+
+void ReachingDefinitionsCalculator::calculate() {
+  init();
+
+  // TODO: Use ForwardDataflowWorklist.
+  for (const std::pair<const CFGBlock *, GenSet> G : Gen)
+    Out[G.first] = {G.second.begin(), G.second.end()};
+
+  WorklistTy Worklist({cfg->begin(), cfg->end()});
+
+  while (!Worklist.empty()) {
+    const CFGBlock *N = Worklist.pop_back_val();
+
+    for (const CFGBlock *Pred : N->preds())
+      llvm::set_union(In[N], Out[Pred]);
+
+    bool HasChanged =
+        llvm::set_union(Out[N], llvm::set_difference(In[N], Kill[N]));
+
+    if (llvm::set_union(Out[N], Gen[N]))
+      HasChanged = true;
+
+    if (HasChanged) {
+      for (const CFGBlock *Succ : N->succs())
+        Worklist.push_back(Succ);
+    }
+  }
+}
+
+void ReachingDefinitionsCalculator::dumpGenSet() const {
+  llvm::errs() << "GEN sets: blockid (varname [blockid, elementid])\n";
+  for (const std::pair<const CFGBlock *, GenSet> D : Gen) {
+    size_t BlockID = llvm::find(*cfg, D.first) - cfg->begin();
+    for (const Definition &Def : D.second) {
+      llvm::errs() << BlockID << ' ';
+      Def.dump();
+      llvm::errs() << '\n';
+    }
+  }
+}
+
+void ReachingDefinitionsCalculator::dumpKillSet() const {
+  llvm::errs() << "KILL sets: blockid (varname [blockid, elementid])\n";
+  for (const std::pair<const CFGBlock *, DefinitionSet> D : Kill) {
+    size_t BlockID = llvm::find(*cfg, D.first) - cfg->begin();
+    for (const Definition &Def : D.second) {
+      llvm::errs() << BlockID << ' ';
+      Def.dump();
+      llvm::errs() << '\n';
+    }
+  }
+}
+
+void ReachingDefinitionsCalculator::dumpReachingDefinitions() const {
+  llvm::errs() << "Reaching definition sets: "
+                  "blockid IN/OUT (varname [blockid, elementid])\n";
+  for (const CFGBlock *B : *cfg) {
+    size_t BlockID = llvm::find(*cfg, B) - cfg->begin();
+    if (In.count(B)) {
+      for (const Definition &Def : In.find(B)->second) {
+        llvm::errs() << BlockID << " IN ";
+        Def.dump();
+        llvm::errs() << '\n';
+      }
+    }
+
+    if (Out.count(B)) {
+      for (const Definition &Def : Out.find(B)->second) {
+        llvm::errs() << BlockID << " OUT ";
+        Def.dump();
+        llvm::errs() << '\n';
+      }
+    }
+  }
+}
Index: clang/lib/Analysis/CMakeLists.txt
===================================================================
--- clang/lib/Analysis/CMakeLists.txt
+++ clang/lib/Analysis/CMakeLists.txt
@@ -22,6 +22,7 @@
   PostOrderCFGView.cpp
   ProgramPoint.cpp
   ReachableCode.cpp
+  ReachingDefinitions.cpp
   RetainSummaryManager.cpp
   ThreadSafety.cpp
   ThreadSafetyCommon.cpp
Index: clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
===================================================================
--- clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
+++ clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
@@ -1415,8 +1415,19 @@
   Dependencies<[DebugContainerModeling, IteratorModeling]>,
   Documentation<NotDocumented>;
 
-} // end "debug"
+def GenSetDumper : Checker<"DumpGenSets">,
+  HelpText<"Dump the GEN sets for each block in a function">,
+  Documentation<NotDocumented>;
+
+def KillSetDumper : Checker<"DumpKillSets">,
+  HelpText<"Dump the KILL sets for each block in a function">,
+  Documentation<NotDocumented>;
 
+def ReachingDefinitionsDumper : Checker<"DumpReachingDefinitions">,
+  HelpText<"Dump the reaching definitions for each block in a function">,
+  Documentation<NotDocumented>;
+
+} // end "debug"
 
 //===----------------------------------------------------------------------===//
 // Clone Detection
Index: clang/include/clang/Analysis/Analyses/ReachingDefinitions.h
===================================================================
--- /dev/null
+++ clang/include/clang/Analysis/Analyses/ReachingDefinitions.h
@@ -0,0 +1,247 @@
+//===--- ReachingDefinitions.h ----------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Calculates reaching definitions for a CFG.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_ANALYSIS_ANALYSES_REACHABLE_DEFINITIONS_H
+#define LLVM_CLANG_ANALYSIS_ANALYSES_REACHABLE_DEFINITIONS_H
+
+#include "clang/AST/Decl.h"
+#include "clang/ASTMatchers/ASTMatchFinder.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "clang/Analysis/CFG.h"
+#include "llvm/ADT/SmallVector.h"
+#include <set>
+
+namespace clang {
+
+//===----------------------------------------------------------------------===//
+// Since reaching definitions was implemented for instructions, it isn't well
+// thought out what a definition of a variable is in C/C++, nor what we really
+// mean under the term variable.
+//===----------------------------------------------------------------------===//
+
+namespace ReachingDefinitionsDetail {
+
+//===----------------------------------------------------------------------===//
+// What a 'variable' is:
+// We define a variable as a VarDecl.
+// TODO: How about records? Add support for fields.
+// TODO: How do we describe the implicit this parameter for methods?
+//===----------------------------------------------------------------------===//
+
+struct Variable {
+  const VarDecl *Var;
+
+  Variable(const VarDecl *Var) : Var(Var) {}
+};
+
+struct VariableLess {
+  // TODO: const ref? but what if we change this to immutablelist?
+  bool operator()(Variable Lhs, Variable Rhs) {
+    return Lhs.Var < Rhs.Var;
+  }
+};
+
+//===----------------------------------------------------------------------===//
+// What a 'definition of a variable' is:
+// Whatever statement that writes a 'variable', or may write a 'variable'.
+// TODO: What about global variables on function calls? We should introduce
+// invalidations as well.
+//===----------------------------------------------------------------------===//
+
+class Definition : public Variable {
+public:
+  enum DefinitionKind { Write };
+
+  CFGBlock::ConstCFGElementRef E;
+  DefinitionKind Kind;
+
+public:
+
+  Definition(const VarDecl *Var, CFGBlock::ConstCFGElementRef E,
+             DefinitionKind Kind)
+      : Variable(Var), E(E), Kind(Kind) {}
+
+  const CFGBlock *getCFGBlock() const { return E.getParent(); }
+
+  // (varname [blockid, elementid]) (reason)
+  void dump() const;
+};
+
+struct VarAndCFGElementLess {
+  bool operator()(Definition Lhs, Definition Rhs) const {
+    return std::tie(Lhs.Var, Lhs.E) <
+           std::tie(Rhs.Var, Rhs.E);
+  }
+};
+
+/// A set of definitions sorted only by the variable, so that each basic block
+/// may only emit a single definition for any single variable.
+// TODO: We need to track more then a single definition to a variable for a
+// block's GEN set. Say, the static analyzer manages to prove that a potential
+// invalidation definition (like a function call) didn't write the variable, we
+// need to retrieve the definitions up to that point in the block.
+// TODO: This is ridiculously espensive, change to ImmutableSet.
+using GenSet = std::set<Definition, VariableLess>;
+
+/// A set of definitions sorted by the variable and the location of the
+/// definition. For KILL, IN and OUT sets this is correct, because a CFGBlock
+/// may kill several definitions of the same variables from different locations.
+using DefinitionSet = std::set<Definition, VarAndCFGElementLess>;
+
+//===----------------------------------------------------------------------===//
+// Determining whether a statement modifies a variable is a challanging, and
+// requires using expensive-to-create matchers, and an easily extensible
+// interface, hence the use of a builder class outside of the main calculator.
+//===----------------------------------------------------------------------===//
+
+class GenSetBuilder;
+
+class GenSetMatcherCallback : public ast_matchers::MatchFinder::MatchCallback {
+protected:
+  GenSetBuilder &GSBuilder;
+
+  GenSetMatcherCallback(GenSetBuilder &GSBuilder) : GSBuilder(GSBuilder) {}
+};
+
+/// Responsible for building the GEN sets for each basic block.
+///
+/// Since pointer escapes or function calls in general require us to generate
+/// definitions that are invalidations, we need to gather all variables relevant
+/// for this analysis, like parameters, locals and globals. We refer to this
+/// stage as 'variable finding':
+///   * Collect all non-local, visible variables
+///   * Collect all local variables within the function that is used
+///
+/// The actual building of GEN sets has two stages:
+///   1. For each CFGStmt in the CFGBlock, look for a statement that may be a
+///      definition of an expression. ('definition finding')
+///   2. Search expressions for variables recursively. ('expression finding')
+class GenSetBuilder {
+  using DefinitionKind = Definition::DefinitionKind;
+
+  // Fields that shouldn't change after the construction of the builder object.
+
+  llvm::SmallVector<std::unique_ptr<GenSetMatcherCallback>, 8> Callbacks;
+
+  ast_matchers::MatchFinder VariableFinder;
+  ast_matchers::MatchFinder DefinitionFinder;
+  ast_matchers::MatchFinder ExpressionFinder;
+
+  const Decl *D;
+  ASTContext *Context = nullptr;
+
+  // TODO: This does not contain non-local visible variables.
+  std::set<Variable, VariableLess> AllVariables;
+
+  // Fields that are changed at and during each GEN set construction.
+
+  GenSet *CurrentGenSet;
+  Optional<CFGBlock::ConstCFGElementRef> CurrentCFGElem;
+
+  // This field is used for definition finder matchers to communicate with 
+  // expression finder matchers what kind of a definition we need to note. This
+  // isn't too pretty, but the interaction in between matcher callbacks are
+  // inherently awkward.
+  DefinitionKind CurrentKind = Definition::Write;
+
+private:
+  // TODO: Make this public and allow custom matchers to be added?
+  template <ast_matchers::MatchFinder GenSetBuilder::*Finder,
+            class GenSetMatcherCallbackTy>
+  void addMatcher() {
+    Callbacks.emplace_back(std::make_unique<GenSetMatcherCallbackTy>(*this));
+    (this->*Finder)
+        .addMatcher(GenSetMatcherCallbackTy::getMatcher(),
+                    Callbacks.back().get());
+  }
+
+public:
+  GenSetBuilder(const Decl *D);
+
+  //===--------------------------------------------------------------------===//
+  // Methods for retrieving a GEN set.
+  //===--------------------------------------------------------------------===//
+
+  void getGenSetForCFGBlock(const CFGBlock *B, GenSet &Gen);
+
+  //===--------------------------------------------------------------------===//
+  // Utility methods for building a GEN set. These are public because the
+  // callback objects will need to call them.
+  //===--------------------------------------------------------------------===//
+
+  /// When we find a definition to an *expression*, we need to see if that
+  /// expression is a variable, or some other expression that needs further
+  /// processing, like in this case:
+  ///   (a, b) = 10;
+  void handleExpr(const Expr *E, DefinitionKind Kind);
+
+  /// Insert a new defintion of a variable into the current GEN set.
+  void insertToGenSet(const VarDecl *Var, DefinitionKind Kind);
+
+  void insertToGenSet(const VarDecl *Var) { insertToGenSet(Var, CurrentKind); }
+
+  /// Called during the variable finding stage.
+  void addVariable(const VarDecl *Var);
+};
+
+} // end of namespace ReachingDefinitionsDetail
+
+/// Calculates reaching definitions for each basic block. This calculation
+/// doesn't try to argue about aliasing, meaning that some definitions are
+/// definite write (we know that the variable it written), and some are a result
+/// of invalidation, like passing a variable as a non-const reference to a
+/// function.
+class ReachingDefinitionsCalculator : public ManagedAnalysis {
+  using GenSetBuilder = ReachingDefinitionsDetail::GenSetBuilder;
+
+public:
+  using GenSet = ReachingDefinitionsDetail::GenSet;
+  using DefinitionKind = ReachingDefinitionsDetail::Definition::DefinitionKind;
+  using DefinitionSet = ReachingDefinitionsDetail::DefinitionSet;
+
+  void calculate();
+
+  void dumpKillSet() const;
+  void dumpGenSet() const;
+  void dumpReachingDefinitions() const;
+
+  static ReachingDefinitionsCalculator *create(AnalysisDeclContext &Ctx) {
+    return new ReachingDefinitionsCalculator(Ctx.getDecl(), Ctx.getCFG());
+  }
+
+  static const void *getTag() {
+    static int x;
+    return &x;
+  }
+
+private:
+  ReachingDefinitionsCalculator(const Decl *D, const CFG *cfg);
+
+  void init();
+
+  const CFG *cfg;
+  GenSetBuilder GSBuilder;
+
+  // TODO: Make the GEN set public to allow clients to remove definitions, or
+  // possibly mark an invalidation as write.
+  // TODO: Should we change to ImmutableMap? Does this matter that much?
+  std::map<const CFGBlock *, GenSet> Gen;
+  std::map<const CFGBlock *, DefinitionSet> Kill;
+
+public:
+  std::map<const CFGBlock *, DefinitionSet> In;
+  std::map<const CFGBlock *, DefinitionSet> Out;
+};
+
+} // end of namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_ANALYSES_REACHABLE_DEFINITIONS_H
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to