[PATCH] D58749: [index-while-building] IndexRecordHasher

Dmitri Gribenko via Phabricator via cfe-commits Tue, 12 Mar 2019 08:48:37 -0700

gribozavr added a comment.

I left some comments, but it is difficult for me to review without 
understanding what the requirements for this hasher are, why some information 
is hashed in, and some is left out, could you clarify?  See detailed comments.


> This implementation is covered by lit tests using it through c-index-test in 
> upcoming patch.

This code looks very unit-testable.  It also handles many corner cases (what 
information is hashed and what is left out).  c-index-tests are integration 
tests that are not as good at that, and also they would be testing this code 
quite indirectly.



================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:175
+    // There's a balance between caching results and not growing the cache too
+    // much. Measurements showed that avoiding caching all decls is beneficial
+    // particularly when including all of Cocoa.
----------------
"caching all decls" => "caching hashes for all decls"


================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:191
+
+  auto asCanon = [](QualType Ty) -> CanQualType {
+    return CanQualType::CreateUnsafe(Ty);
----------------
I don't think that hiding a "CreateUnsafe" in an operation with a benign name 
"asCanon" is a good idea.  Why not inline this lambda everywhere, it looks like 
it is defined to only save typing a few characters.


================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:207
+    if(qVal)
+      Hash = hash_combine(Hash, qVal);
+
----------------
What about other qualifiers?

Why not use `Qualifiers::getAsOpaqueValue()` instead of manually converting a 
subset of qualifiers to unsigned?


================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:209
+
+    // Hash in ObjC GC qualifiers?
+
----------------
Is this a FIXME or?..


================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:215
+    if (const PointerType *PT = dyn_cast<PointerType>(T)) {
+      Hash = hash_combine(Hash, '*');
+      CT = asCanon(PT->getPointeeType());
----------------
Why not hash in `T->getTypeClass()` uniformly for all types, instead of 
inventing a random sigil?


================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:219
+    }
+    if (const ReferenceType *RT = dyn_cast<ReferenceType>(T)) {
+      Hash = hash_combine(Hash, '&');
----------------
There is LValueReferenceType and RValueReferenceType, do we care about the 
difference?  If not, why not?


================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:254
+
+    break;
+  }
----------------
What about all other types -- array type, function type, decltype(), ...?

What about attributes?


================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:277
+template <typename T>
+hash_code IndexRecordHasher::tryCache(const void *Ptr, T Obj) {
+  auto It = HashByPtr.find(Ptr);
----------------
I had to read the implementation of this function to understand what it does.  
How about renaming it to "cachedHash()" ?


================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:291
+
+hash_code IndexRecordHasher::hashImpl(const Decl *D) {
+  return DeclHashVisitor(*this).Visit(D);
----------------
hashImpl => cachedHashImpl


================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:409
+
+  // Unhandled type.
+  return Hash;
----------------
So what is the failure mode for unhandled types, what is the effect on the 
whole system?


================
Comment at: clang/lib/Index/IndexRecordHasher.cpp:476
+  case NestedNameSpecifier::TypeSpecWithTemplate:
+    // Fall through to hash the type.
+
----------------
I'd suggest to remove this comment.  It is more confusing than helpful.  It 
makes the code look like there's some processing (like a "break" in other 
cases), but at a close inspection it turns out to be a comment.  This kind of 
fallthrough is pretty common and I don't think it requires a comment.  (For 
example, hashImpl above has this kind of fallthough, but does not have comments 
like this.)


================
Comment at: clang/lib/Index/IndexRecordHasher.h:31
+/// Implements hashing of AST nodes suitable for the index.
+/// Caching all produced hashes.
+class IndexRecordHasher {
----------------
Could you add some explanation about the information that is being hashed?  
What is the underlying principle behind choosing to hash some aspect of the AST 
node (or to skip it).  For example, I see you're hashing names, but not, say 
source locations.  Or that QualTypes are first translated into canonical types.

What are the completeness constraints for this hasher?  What happens if we 
don't hash something that we should have?

"Caching all produced hashes" seems like an implementation comment.  Especially 
since the implementation chooses not to cache some of the hashes.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58749/new/

https://reviews.llvm.org/D58749



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D58749: [index-while-building] IndexRecordHasher

Reply via email to