[PATCH] D86671: [clang-tidy] Add new case type to check variables with Hungarian notation

Aaron Ballman via Phabricator via cfe-commits Thu, 24 Sep 2020 10:06:57 -0700

aaron.ballman added a comment.

In D86671#2284078 <https://reviews.llvm.org/D86671#2284078>, @dougpuob wrote:

> Hi @aaron.ballman
> About changing  `size_t nLength` to `cbLength`. I searched MSVC folders with 
> `size_t`, many names of variable start with `n`, or `i` in MFC related files. 
> So I prefer to keep it starts with `n`. Another side to name starts with 
> `cb`, I found variables like cbXxx are defined with ULONG, DWORD, or UCHAR 
> type.

I think the important point is that `cb` is used for APIs that specify a count 
of bytes, whereas `i`, `n`, and `l` are used for integers (there is no specific 
prefix for `size_t` that I'm aware of or that MSDN documents, so the lack of 
consistency there is not too surprising). That's why I mentioned that using a 
heuristic approach may allow you to identify the parameters that are intended 
to be a count of bytes so that you can use the more specific prefix if there's 
more specific contextual information available.

================
Comment at: 
clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.cpp:236
+      PrefixStr = "fn"; // Function Pointer
+    } else if (QT->isPointerType()) {
+      // clang-format off
----------------
dougpuob wrote:
> aaron.ballman wrote:
> > I'm not certain how valid it is to look at just the type and decide that 
> > it's a null-terminated string. For instance, the following is not an 
> > uncommon pattern: `void something(const char *buffer, size_t length);` and 
> > it would be a bit strange to change that into `szBuffer` as that would give 
> > an indication that the buffer is null terminated. You could look at 
> > surrounding code for additional information though, like other parameters 
> > in a function declaration. As an aside, this sort of heuristic search may 
> > also allow you to change `length` into `cbLength` instead of `nLength` for 
> > conventions like the Microsoft one.
> > 
> > However, for local variables, I don't see how you could even come up with a 
> > heuristic like you could with parameters, so I'm not certain what the right 
> > answer is here.
> OK (size_t nLength --> cbLength)
> 
> About the `void something(const char *buffer, size_t length)` pattern. `char` 
> is a special type which can express signed char and unsigned char.  One can 
> express C string(ASCII), another expresses general data buffer(in hex). I 
> think you are mentioning when it is a general data buffer. I agree with you 
> if it changed the name from `buffer` to `szBuffer` for general data buffer is 
> strange. I prefer to use `uint8_t` or `unsigned char` instead.
> 
> How about adding a new config option for it? like, `  - { key: 
> readability-identifier-naming.HungarainNotation.DontChangeCharPointer   , 
> value: 1 }` Users can make decision for their projects. For consistency with 
> HN, the default will still be changed to `szBuffer` in your case.
> 
> If I add the option, does it make sense to you from your side?
I'm not certain that approach would make sense because the decision needs to be 
made on a case-by-case basis. Taking the C standard library as an example set 
of APIs, a function like `char *strncpy(char * restrict s1, const char * 
restrict s2, size_t n);` is one where this functionality should not change `s1` 
or `s2` to be `szS1` or `szS2` because the `sz` modifier indicates a 
null-terminated string, but these strings do not have to be null terminated. 
However, an API like `char *strncat(char * restrict s1, const char * restrict 
s2, size_t n);` should change `s1` to be `szS1` because that string is required 
to be null terminated. In both cases, the declarations use `char`.

`uint8_t` and friends are newer datatypes (added in C99) and are not required 
to be present (so they're not portable to all implementations), which may 
account for buffer-based APIs still using `char`.

I think adding some heuristics to the check may help some of these situations 
pick the correct prefix, but I'm worried that cases like the two above are 
going to be almost intractable for the check. I don't have a good idea of how 
prevalent that kind of issue will be, though. One thing that would help me feel 
more comfortable would be to look at some data about how the check performs on 
some various code bases. e.g., pick some large or popular libraries that use C, 
C++, or both (Win32 SDK, LLVM headers, OpenSSL, etc), run the check over it, 
and check the results to see how often the prefix is right compared to how 
often it's wrong. If we find it's mostly right and only gets it wrong a small 
percentage of the time, then we've likely found a good compromise for the 
check's behavior. If we find it gets it wrong too often, then the kinds of 
tweaks needed may be more obvious.

Btw, the reason I think it's important to be careful about false positives for 
this particular check is because Hungarian notation is intended to convey 
semantic information in some cases (like whether a buffer is null terminated or 
not) and that can have security implications. This is the sort of check where 
it's easy for a person to change all the identifiers in their program assuming 
they're correct and only find out about the issue much later when someone else 
gets the semantics wrong because of an incorrect prefix.

================
Comment at: 
clang-tools-extra/clang-tidy/readability/IdentifierNamingCheck.cpp:319
+std::string
+IdentifierNamingCheck::getDeclTypeName(const clang::NamedDecl *Decl) const {
+  const ValueDecl *ValDecl = dyn_cast<ValueDecl>(Decl);
----------------
dougpuob wrote:
> aaron.ballman wrote:
> > `ND` instead of `Decl`.
> > 
> > The function name doesn't really help me to understand why you'd call this 
> > as opposed to getting the type information as a string from the `NamedDecl` 
> > itself. I'm a bit worried about maintaining this code as the language 
> > evolves -- Clang will get new keywords, and someone will have to remember 
> > to come update this code. Could you get away with using 
> > `Decl->getType()->getAsString()` and working with that rather than going 
> > back to the original source code and trying to parse manually?
> OK, I should do it. (ND instead of Decl.)
> 
> Use `Decl->getType()->getAsString()` is a good way. But HN is a 
> strongly-typed notation, if users haven't specific the include directories, 
> the checking result may look messy (it will be changed to unexpected type). 
> So I parse a string with `SourceLocation`, just for user-friendly.
> 
> I understand your consideration, maintaining-friendly is also important. I 
> can implement it with `Decl->getType()->getAsString()`, if my explanation 
> can't convince you still. It is also fine to me. Or you think it just need a 
> better appropriate function name?
> But HN is a strongly-typed notation, if users haven't specific the include 
> directories, the checking result may look messy (it will be changed to 
> unexpected type). 

I'm not certain I understand what issue you're worried about here. Do you have 
a code example that might help clarify?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86671/new/

https://reviews.llvm.org/D86671

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D86671: [clang-tidy] Add new case type to check variables with Hungarian notation

Reply via email to