honggyu.kim added a comment. I would like to also write about bug identification methods. As I observed the current CmpRuns.py script, the IssueIdentifier is defined as follows:
def getIssueIdentifier(self) : id = self.getFileName() + "+" if 'issue_context' in self._data : id += self._data['issue_context'] + "+" if 'issue_hash' in self._data : id += str(self._data['issue_hash']) return id https://github.com/llvm-mirror/clang/blob/master/utils/analyzer/CmpRuns.py#L69-L75 It has 3 items to generate a bug identification. (1) file name (2) function name - issue_context (3) line offset from the beginning of function - issue_hash As of now, we generate issue_hash by simply calculating the line offset from the first line of the function. FullSourceLoc UL(SM->getExpansionLoc(UPDLoc.asLocation()), *SM); FullSourceLoc UFunL(SM->getExpansionLoc( D->getUniqueingDecl()->getBody()->getLocStart()), *SM); o << " <key>issue_hash</key><string>" << UL.getExpansionLineNumber() - UFunL.getExpansionLineNumber() << "</string>\n"; https://github.com/llvm-mirror/clang/blob/master/lib/StaticAnalyzer/Core/PlistDiagnostics.cpp#L423-L452 On the other hand, this patch generates BugID as follows: llvm::SmallString<32> clang::GetIssueHash(const SourceManager *SM, FullSourceLoc &L, StringRef CheckerName, StringRef HashField, const Decl *D) { static llvm::StringRef Delimiter = "$"; return GetHashOfContent( (llvm::Twine(CheckerName) + Delimiter + GetEnclosingDeclContextSignature(D) + Delimiter + std::to_string(L.getExpansionColumnNumber()) + Delimiter + NormalizeLine(SM, L, D) + Delimiter + HashField.str()).str()); } It has 6 items to generate a bug identification. (1) file name (removed now) (2) checker name (3) function name - GetEnclosingDeclContextSignature(D) (4) column number (5) source line string after removing whitespace - NormalizeLine(SM, L, D) (6) bug type - D->getBugType() I think even if this patch is not accepted, we need to accept some of the methods suggested by this patch. Current CmpRuns.py cannot distinguish the following 2 different bugs. BUG 1. garbage return value 1 int main() 2 { 3 int a; 4 return a; 5 } test.c:4:3: warning: Undefined or garbage value returned to caller return a; ^~~~~~~~ BUG 2. garbage assignment 1 int main() 2 { 3 int a; 4 int b = a; 5 return b; 6 } test.c:4:3: warning: Assigned value is garbage or undefined int b = a; ^~~~~ ~ In this case, getIssueIdentifier() returns the same ID for both cases as below: <filename> + <function name> + <line offset from function> test.c + main + 2 We cannot distinguish those cases with the current CmpRuns.py, so at least we need to add checker information from <check_name>. BUG 3. a single line of comment is added based on BUG 1 code. 1 int main() 2 { 3 // main function 4 int a; 5 return a; 6 } test.c:5:3: warning: Undefined or garbage value returned to caller return a; ^~~~~~~~ If we compare BUG3 with BUG1, CmpRuns.py shows those bugs are different even though only a single line of comment is added without actual modification. REMOVED: 'test.c:4:3, Logic error: Undefined or garbage value returned to caller' ADDED: 'test.c:5:3, Logic error: Undefined or garbage value returned to caller' TOTAL REPORTS: 1 TOTAL DIFFERENCES: 2 I think we need to enhance issue_hash generation method in order to avoid those cases. http://reviews.llvm.org/D10305 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits