teemperor created this revision.
Herald added a subscriber: xazax.hun.

This patch  aims at optimizing the CloneChecker for larger programs. Before 
this patch we took around 102 seconds to analyze sqlite3 with a complexity 
value of 50. After this patch we now take 2.1 seconds to analyze sqlite3.

The biggest performance optimization is that we now put the constraint for 
group size before the constraint for the complexity. The group size constraint 
is much faster in comparison to the complexity constraint as it only does a 
simple integer comparison. The complexity constraint on the other hand actually 
traverses each Stmt and even checks the macro stack, so it is obviously not 
able to handle larger amounts of incoming clones. The new order filters out all 
the single-clone groups that the type II constraint generates in a faster way 
before passing the fewer remaining clones to the complexity constraint. This 
reduced runtime by around 95%.

The other change is that we also delay the verification part of the type II 
clones back in the chain of constraints. This required to split up the 
constraint into two parts - a verification and a hash constraint (which is also 
making it more similar to the original design of the clone detection 
algorithm). The reasoning for this is the same as before: The verification 
constraint has to traverse many statements and shouldn't be at the start of the 
constraint chain. However, as the type II hashing has to be the first step in 
our algorithm, we have no other choice but split this constrain into two 
different ones. Now our group size and complexity constrains filter out a chunk 
of the clones before they reach the slow verification step, which reduces the 
runtime by around 8%.

I also kept the full type II constraint around - that now just calls it's two 
sub-constraints - in case someone doesn't care about the performance benefits 
of doing this.


https://reviews.llvm.org/D34182

Files:
  include/clang/Analysis/CloneDetection.h
  lib/Analysis/CloneDetection.cpp
  lib/StaticAnalyzer/Checkers/CloneChecker.cpp

Index: lib/StaticAnalyzer/Checkers/CloneChecker.cpp
===================================================================
--- lib/StaticAnalyzer/Checkers/CloneChecker.cpp
+++ lib/StaticAnalyzer/Checkers/CloneChecker.cpp
@@ -78,9 +78,10 @@
   // because reportSuspiciousClones() wants to search them for errors.
   std::vector<CloneDetector::CloneGroup> AllCloneGroups;
 
-  Detector.findClones(AllCloneGroups, RecursiveCloneTypeIIConstraint(),
-                      MinComplexityConstraint(MinComplexity),
-                      MinGroupSizeConstraint(2), OnlyLargestCloneConstraint());
+  Detector.findClones(
+      AllCloneGroups, RecursiveCloneTypeIIHashConstraint(),
+      MinGroupSizeConstraint(2), MinComplexityConstraint(MinComplexity),
+      RecursiveCloneTypeIIVerifyConstraint(), OnlyLargestCloneConstraint());
 
   if (ReportSuspiciousClones)
     reportSuspiciousClones(BR, Mgr, AllCloneGroups);
Index: lib/Analysis/CloneDetection.cpp
===================================================================
--- lib/Analysis/CloneDetection.cpp
+++ lib/Analysis/CloneDetection.cpp
@@ -381,9 +381,17 @@
   return HashCode;
 }
 
-size_t RecursiveCloneTypeIIConstraint::saveHash(
-    const Stmt *S, const Decl *D,
-    std::vector<std::pair<size_t, StmtSequence>> &StmtsByHash) {
+/// Generates and saves a hash code for the given Stmt.
+/// \param S The given Stmt.
+/// \param D The Decl containing S.
+/// \param StmtsByHash Output parameter that will contain the hash codes for
+///                    each StmtSequence in the given Stmt.
+/// \return The hash code of the given Stmt.
+///
+/// If the given Stmt is a CompoundStmt, this method will also generate
+/// hashes for all possible StmtSequences in the children of this Stmt.
+size_t saveHash(const Stmt *S, const Decl *D,
+                std::vector<std::pair<size_t, StmtSequence>> &StmtsByHash) {
   llvm::MD5 Hash;
   ASTContext &Context = D->getASTContext();
 
@@ -474,6 +482,14 @@
 
 void RecursiveCloneTypeIIConstraint::constrain(
     std::vector<CloneDetector::CloneGroup> &Sequences) {
+  RecursiveCloneTypeIIHashConstraint Hash;
+  Hash.constrain(Sequences);
+  RecursiveCloneTypeIIVerifyConstraint Verify;
+  Verify.constrain(Sequences);
+}
+
+void RecursiveCloneTypeIIHashConstraint::constrain(
+    std::vector<CloneDetector::CloneGroup> &Sequences) {
   // FIXME: Maybe we can do this in-place and don't need this additional vector.
   std::vector<CloneDetector::CloneGroup> Result;
 
@@ -513,8 +529,7 @@
 
       for (; i < StmtsByHash.size(); ++i) {
         // A different hash value means we have reached the end of the sequence.
-        if (PrototypeHash != StmtsByHash[i].first ||
-            !areSequencesClones(StmtsByHash[i].second, Current.second)) {
+        if (PrototypeHash != StmtsByHash[i].first) {
           // The current sequence could be the start of a new CloneGroup. So we
           // decrement i so that we visit it again in the outer loop.
           // Note: i can never be 0 at this point because we are just comparing
@@ -537,6 +552,14 @@
   Sequences = Result;
 }
 
+void RecursiveCloneTypeIIVerifyConstraint::constrain(
+    std::vector<CloneDetector::CloneGroup> &Sequences) {
+  CloneConstraint::splitCloneGroups(
+      Sequences, [](const StmtSequence &A, const StmtSequence &B) {
+        return areSequencesClones(A, B);
+      });
+}
+
 size_t MinComplexityConstraint::calculateStmtComplexity(
     const StmtSequence &Seq, const std::string &ParentMacroStack) {
   if (Seq.empty())
Index: include/clang/Analysis/CloneDetection.h
===================================================================
--- include/clang/Analysis/CloneDetection.h
+++ include/clang/Analysis/CloneDetection.h
@@ -254,20 +254,37 @@
 
 /// Searches all children of the given clones for type II clones (i.e. they are
 /// identical in every aspect beside the used variable names).
+///
+/// This constraint is also available to be executed in two phases, see
+/// RecursiveCloneTypeIIHashConstraint and RecursiveCloneTypeIIVerifyConstraint
+/// for more.
 class RecursiveCloneTypeIIConstraint {
+public:
+  void constrain(std::vector<CloneDetector::CloneGroup> &Sequences);
+};
 
-  /// Generates and saves a hash code for the given Stmt.
-  /// \param S The given Stmt.
-  /// \param D The Decl containing S.
-  /// \param StmtsByHash Output parameter that will contain the hash codes for
-  ///                    each StmtSequence in the given Stmt.
-  /// \return The hash code of the given Stmt.
-  ///
-  /// If the given Stmt is a CompoundStmt, this method will also generate
-  /// hashes for all possible StmtSequences in the children of this Stmt.
-  size_t saveHash(const Stmt *S, const Decl *D,
-                  std::vector<std::pair<size_t, StmtSequence>> &StmtsByHash);
+/// This constraint performs only the hashing part of the
+/// RecursiveCloneTypeIIConstraint.
+///
+/// It is supposed to be fast and can be used at the front of the constraint
+/// chain. However, it has a tiny chance to generate false-positives where the
+/// clones in a clone group are not actually type II clones of each other.
+/// This happens only due to hash collisions and they can be removed by the
+/// RecursiveCloneTypeIIVerifyConstraint.
+class RecursiveCloneTypeIIHashConstraint {
+public:
+  void constrain(std::vector<CloneDetector::CloneGroup> &Sequences);
+};
 
+/// This constraint performs only the verification part of the
+/// RecursiveCloneTypeIIConstraint.
+///
+/// It is supposed to be used behind the RecursiveCloneTypeIIHashConstraint
+/// and verifies that all clones in a group are actually type II clones of
+/// each other. However, this constraint is quite slow, so if you have faster
+/// constraints that can handle false-positives generated by hash collisions,
+/// then prepend those constraints to this one for optimal performance.
+class RecursiveCloneTypeIIVerifyConstraint {
 public:
   void constrain(std::vector<CloneDetector::CloneGroup> &Sequences);
 };
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to