whisperity added a reviewer: aaron.ballman.
whisperity added a comment.

Right, let's bump. I've checked the output of the checker on a set of test 
projects (our usual testbed, I should say) which contains around 15 projects, 
half of which was C and the other half C++. All projects were analysed 
independently in a virtual machine. All projects were analysed after checking 
out the //latest release tag// that was closest to Jan 2019. (I know this 
sounds dated, but the whole reproduction image was originally constructed for 
another checker I'm working on, and I did not wish to re-do the whole //"What 
is needed to install this particular version?"// pipeline.) The more important 
thing is that //releases// were analysed, which should, in theory, 
under-approximate the findings because releases are generally more polished 
than in-the-works code. The check was run as-is (sans a minor rebase issue that 
does not affect functionality) currently in the patch, namely, Diff 259508 
<http://reviews.llvm.org/D20689?id=259508>.

In total, I had received **194 reports** (129 unique (1)). From this, I had 
found **8** (7 unique) **true positives** (marked //Confirmed bug// in 
CodeChecker terms), where something is visibly wrong with the call.
From this, there were 3 in a project where the called function's //comment// 
said that //"The order of arguments <> and <> does not matter."//, however, 
because this can never be figured out totally from the checker, I regarded the 
swapped call as a true positive. The fact that they had to seemingly create, 
inside the called function, some logic to detect and handle the swap shows that 
something was going wrong with the code's design for a long time.

In addition to these findings, I have identified **122** (75 unique) function 
call sites where the report about the potential swap (and the similarity to a 
//different// parameter name) is justifiable because the **understandability** 
of the code (especially to someone who is an outsider from virtually all of the 
projects analysed (2)) **is hindered** by the poor choice of argument or 
parameter names. The conclusions from these cases (marked //Intentional// in 
the CodeChecker database) are consistent with those drawn in [Pradel2013] and 
[Liu2016].

Now, onto the **false positives**. There were **64** (47 unique) cases. 
However, these cases can be further broken down into different categories, 
which I wasn't able to tally easily as CodeChecker only supports 3 unique 
categories, not infinite ones.

- Some of the false positives are what one would say "borderline": if the 
person reading the code reads it accurately, the reported "swap" does not fall 
into the //understandability issue// category. However, a famous case of this 
is from Postgres: `reloid` (IIRC, meaning **rel**ation **o**wner **id**) and 
`roleid` (**role** (user) **id**) are the names of args/params of some 
functions. They are not swapped in the calls that exist in the code, but the 
similarity (swapping only 2 letters) makes it very easy to typo or misread the 
thing. In Postgres, there were 7 such cases.
- Approximately 5+5 cases are false positives but can be dealt with in 
heuristics. However, across the 17 projects, they do not account for a sizeable 
amount of cases.
  - Recursive function calls should be ignored. (This was mentioned in 
[Rice2017].)
  - Swapped and not-swapped calls appearing close to one another (i.e. 
constructs like `b ? f(x,y) : f(y,x)`) should be ignored too. This seems a bit 
harder to implement.
- There is a **bug** in the check's current implementation when calls from/to 
`operator()` is handled. (3) I'll look into fixing this.
- Binary operators should be ignored from reporting in general. Their 
parameters tend to have generic names, and the reports created from them is 
confusing.

Another generic observation is that the check's output is pretty wonky and hard 
to read at a glance, but this should be easy to fix. In addition, the 
observation of [Rice2017] about ignoring "patterned names" (i.e. `arg1, arg2, 
...`) seems like a useful thing to add to the implementation, even though I had 
no findings //at all// where "ignoring patterned names" would've squelched a 
false positive report.

Agreeably, this check is limited compared to the previously linked [Rice2017], 
as it only checks the names in the call, not all variables in the surrounding 
context.

----

//(1)//: I'm not exactly sure as to what "report uniqueing" in CodeChecker 
precisely does these days, but basically it uses the context of the bug and the 
checker message and whatnot to create a hash for the bug - "unique reports" 
mode shows each report belonging to the same hash only once.
//(2)//: From the set of test projects, I only have some hands-on experience 
with parts of LLVM and Apache Xerces.
//(3)//: Codes similar in nature to the following example of exact value 
forwarding to the call operator seems to still trigger the check. I have yet to 
actually pin down what causes this.

  struct S
  {
      int operator()(int a, int b, const char* c, double d);
  };
  
  struct T
  {
      S* s;
      int operator(int a, int b, const char* c, double d)
      {
          return (*s)(a, b, c, d);
      }
  };

//[Pradel2013]//: Michael Pradel, and Thomas R. Gross: Name-based analysis of 
equally typed method arguments. In: IEEE Transactions on Software Engineering, 
**39**(8), pp. 1127-1143, 2013.
//[Liu2016]//: Hiu Liu, et al.: Nomen est Omen: Exploring and exploiting 
similarities between argument and parameter names. In: 38th IEEE International 
Conference on Software Engineering, pp. 1063-1073, 2016.
//[Rice2017]//: Andrew Rice, et al.: Detecting argument selection defects. In: 
Proceedings of the ACM on Programming Languages, **1**, pp. 104:1-104:23, 2017.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D20689/new/

https://reviews.llvm.org/D20689

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to