honggyu.kim added a comment.

I would like to also write about bug identification methods.
As I observed the current CmpRuns.py script, the IssueIdentifier is defined as 
follows:

  def getIssueIdentifier(self) :
      id = self.getFileName() + "+"
      if 'issue_context' in self._data :
        id += self._data['issue_context'] + "+"
      if 'issue_hash' in self._data :
        id += str(self._data['issue_hash'])
      return id

https://github.com/llvm-mirror/clang/blob/master/utils/analyzer/CmpRuns.py#L69-L75

It has 3 items to generate a bug identification.
(1) file name
(2) function name - issue_context
(3) line offset from the beginning of function - issue_hash

As of now, we generate issue_hash by simply calculating the line offset from 
the first line of the function.

  FullSourceLoc UL(SM->getExpansionLoc(UPDLoc.asLocation()),
                   *SM);
  FullSourceLoc UFunL(SM->getExpansionLoc(
    D->getUniqueingDecl()->getBody()->getLocStart()), *SM);
  o << "  <key>issue_hash</key><string>"
    << UL.getExpansionLineNumber() - UFunL.getExpansionLineNumber()
    << "</string>\n";

https://github.com/llvm-mirror/clang/blob/master/lib/StaticAnalyzer/Core/PlistDiagnostics.cpp#L423-L452

On the other hand, this patch generates BugID as follows:

  llvm::SmallString<32> clang::GetIssueHash(const SourceManager *SM,
                                            FullSourceLoc &L,
                                            StringRef CheckerName,
                                            StringRef HashField, const Decl *D) 
{
    static llvm::StringRef Delimiter = "$";
    
    return GetHashOfContent(
        (llvm::Twine(CheckerName) + Delimiter + 
GetEnclosingDeclContextSignature(D) +
         Delimiter + std::to_string(L.getExpansionColumnNumber()) + Delimiter +
         NormalizeLine(SM, L, D) + 
         Delimiter + HashField.str()).str());
  }

It has 6 items to generate a bug identification.
(1) file name (removed now)
(2) checker name
(3) function name - GetEnclosingDeclContextSignature(D)
(4) column number
(5) source line string after removing whitespace - NormalizeLine(SM, L, D)
(6) bug type - D->getBugType()

I think even if this patch is not accepted, we need to accept some of the 
methods suggested by this patch.
Current CmpRuns.py cannot distinguish the following 2 different bugs.

BUG 1. garbage return value

  1 int main()
  2 {
  3   int a;
  4   return a;
  5 }
  
  test.c:4:3: warning: Undefined or garbage value returned to caller
    return a;
    ^~~~~~~~

BUG 2. garbage assignment

  1 int main()
  2 {
  3   int a;
  4   int b = a;
  5   return b;
  6 }
  
  test.c:4:3: warning: Assigned value is garbage or undefined
    int b = a;
    ^~~~~   ~

In this case, getIssueIdentifier() returns the same ID for both cases as below:
<filename> + <function name> + <line offset from function>
test.c + main + 2

We cannot distinguish those cases with the current CmpRuns.py, so at least we 
need to add checker information from <check_name>.

BUG 3. a single line of comment is added based on BUG 1 code.

  1 int main()
  2 {
  3   // main function
  4   int a;
  5   return a;
  6 }
  
  test.c:5:3: warning: Undefined or garbage value returned to caller
    return a;
    ^~~~~~~~

If we compare BUG3 with BUG1, CmpRuns.py shows those bugs are different even 
though only a single line of comment is added without actual modification.

  REMOVED: 'test.c:4:3, Logic error: Undefined or garbage value returned to 
caller'
  ADDED: 'test.c:5:3, Logic error: Undefined or garbage value returned to 
caller'
  TOTAL REPORTS: 1
  TOTAL DIFFERENCES: 2

I think we need to enhance issue_hash generation method in order to avoid those 
cases.


http://reviews.llvm.org/D10305



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to