[PATCH] D69560: [clang-tidy] Add 'experimental-cppcoreguidelines-avoid-adjacent-arguments-of-the-same-type' check

Whisperity via Phabricator via cfe-commits Thu, 24 Sep 2020 06:22:59 -0700

whisperity added a comment.

I'd like to resurrect the discussion about this. Unfortunately, the concept of 
`experimental-` group (in D76545 <https://reviews.llvm.org/D76545>) has fallen 
silent, too. We will present the results of this analysis soon at IEEE SCAM 
(Source Code Analysis and Manipulation) 2020 <http://ieee-scam.org/2020>. While 
a previous submission about this topic was rejected on the grounds of not being 
in line with the conference's usual, hyper-formalised nature, with one reviewer 
//especially// praising the paper for its nice touch with the empirical world 
and the fact that neither argument swaps (another checker worked on by a 
colleague) nor the interactions of this issue with the type system (this 
checker) was really investigated in the literature for C++ before, we've 
tailored both the paper and the implementation based on reviewer comments from 
the scientific world, and the comments here.
The reviews were mostly favourable, excl. comments about the lack of 
formalisation and the unfortunate lack of modelling for template 
instantiations. Such a cold cut, however, //only// introduces false negatives 
into the result set. Which is fine, as the users will never see those 
non-existent reports!

In D69560#1936574 <https://reviews.llvm.org/D69560#1936574>, @aaron.ballman 
wrote:

> This is a check that is almost impossible to comply with when developing new 
> code because it is insufficiently clear on what constitutes "related" 
> parameters.

I agree the CppCG is rather vague about this. From an angle, it seems 
intentional... "relatedness" is hard to pin down formally, it might vary from 
one "module" to the next even in the same project. In a subsequent patch to 
this, I've given a few good examples as to what can be reasonably culled by 
relatedness heuristics. They filter a good chunk of the results, letting go of 
most of the trivially "valid (but maybe nonsense) if swapped" functions (min, 
max, find, etc.)

In D69560#1936574 <https://reviews.llvm.org/D69560#1936574>, @aaron.ballman 
wrote:

> Two adjacent types in a function parameter list is an incredibly common 
> situation that arises naturally when writing code [...] and designing around 
> that from scratch is not always possible.

I see the reasoning behind this. However, this feels more like a motivation 
//against// the rule itself, and not the checker. We can debate the 
justification for the rule's existence, but (and correct me if I'm wrong) so 
far I have not seen any tool (that's publicly available, and is for C(++), 
etc...) that attempts to check your code satisfying this particular rule.

In D69560#1936574 <https://reviews.llvm.org/D69560#1936574>, @aaron.ballman 
wrote:

> (especially for constructors, where your parameter order has to match the 
> declaration order within the class)

CMIIW, but I believe this statement is not true. Aggregate initialisation does 
not care about your constructors, and yes, takes arguments in the order of 
fields. However, once you have a user-defined constructor, you should be able 
to define your parameters in any order you like. The only thing you //should// 
match is that the //member-init-exprs// of the constructor are evaluated in 
order of field declarations, not in the order you wrote them. But trespassing 
that rule already emits a compiler warning, and in reality, the trespass only 
causes an issue if there is a data dependency between your fields. You 
//should// ensure your //member-init-exprs// are in the right order (to guard 
that a later change introducing a data dependency won't blow you up), but this 
has zero effect on the order of parameters of the constructor.

In D69560#1936574 <https://reviews.llvm.org/D69560#1936574>, @aaron.ballman 
wrote:

> Retroactively applying this rule to an existing code base would be nearly 
> impossible for a project of any size.

It is hard in the general case (although there are some approaches and 
foundations that we could build upon with refactoring, I've taken up the mantle 
of spending some time with this research), but there are cases where 
"modernisation" efforts, especially tool-aided ones are already possible. I 
have been recently suggested reading this paper: //H. K. Wright: Incremental 
Type Migration Using Type Algebra <http://research.google/pubs/pub49386/>// 
While I agree that the case study in the paper is rather domain-specific if we 
consider the example there: once a tool has identified a variable/expression to 
not be a `double` but instead `abseil::Duration`, from a function call where 
such a refactored argument is passed, you can (or, hopefully, need to) replace 
the type of the parameter. And one such "adjacent parameters of the same type" 
has been refactored.

My position with the paper, and this implementation, is that it is more useful 
for new projects than existing ones. It might definitely not be useful //for 
LLVM// (a humongous project!), and it might not be possible to fix //every// 
report in an afternoon. But thanks to the vast amount of IDEs integrating 
Clang-Tidy, if someone wishes to adhere to this rule, they can have a 
(reasonably) immediate report the moment they trespassed the rule. You can also 
consider it in an incremental mode (see my comment above with incremental 
report gating via CodeChecker, but surely it's equally trivial to implement in 
other pipelines) or only in new or new-ish TUs/modules inside the project.

In addition, there are plenty of guidelines other than CppCG, which have at 
least some of their rules mapped to Tidy checks (and surely other analysers' 
routines). If a project (be it of any size and any domain) does not wish to 
adhere to a guideline (or a rule of a guideline), they can configure their 
invocation to not run those checks. I agree the fact that Tidy not supporting 
"opt-in" checkers by default is a bit of a drawback here, however, consider the 
"execution paths" one project might take, w.r.t. to this analysis:

//The checker is enabled, and it produces between one and a bazillion reports.//

1. They realise their project does not conform to CppCG (in which case why is 
the checker group enabled in the first place?) or specifically CppCG#I.24 and 
decide to disable the check. All is well in the world, and nothing happened 
excluding one new line being added to a configuration file.
2. Multiple projects big and famous enough see that their software are 
"non-compliant" (at least from the PoV of conformance to CppCG#I.24). They 
together start to question the necessity of such rule... maybe it shouldn't be 
a rule in the first place, because it's woefully stupid and impossible to 
adhere to. This is actionable data one can talk about at conferences and go to 
the guideline maintainers(?) with, and, if so, the rule gets 
deprecated/abolished/move to an "intellectual nitpicking" section. (In this 
case, implementing this check was a nice learning opportunity, and eventually, 
it will also be removed from Tidy.)
3. A //new// projects start developing, and they start seeing this rule. They 
can also decide whether or not conformance to the rule is something they want 
to live with. If not, previous cases apply. If so, we start to grow projects 
that are //somewhat// more type-safe, or less error-prone.

So far, we do not know how painful actual conformance to this rule will be. 
And, as with most static analysis, we might never know if there ever was "one 
life saved" by someone deciding to heed this warning. This won't be a panacea 
overnight, but I still believe it, first, does not have to be, and second, is a 
step in the right direction.

In D69560#1890767 <https://reviews.llvm.org/D69560#1890767>, @vingeldal wrote:

> I'm all for investigating the performance of this check and basing any 
> decisions on data. Until we have that data I just want to challenge the claim 
> that there are no suitable ways to turn the check of for select portions of 
> the code base,

Computational performance is in line with Tidy checks in general. LLVM took 33 
minutes to run, generally smaller projects are less than a minute. (`-j24`)
As for the performance in terms of the results... I have a virtual image (a 
previous conference we submitted the paper to wanted us to submit reproduction 
images, but because the paper was rejected, I believe they have not archived 
the image). I've found a CodeChecker database with 15 projects analysed (incl. 
LLVM) but it's ~150 MiB in size... 😦

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69560/new/

https://reviews.llvm.org/D69560

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D69560: [clang-tidy] Add 'experimental-cppcoreguidelines-avoid-adjacent-arguments-of-the-same-type' check

Reply via email to