On Tue, May 7, 2019 at 7:40 PM Richard Trieu <rtr...@google.com> wrote: > > > From: David Blaikie <dblai...@gmail.com> > Date: Mon, May 6, 2019 at 4:39 PM > To: Richard Trieu > Cc: cfe-commits > >> On Mon, May 6, 2019 at 4:24 PM Richard Trieu <rtr...@google.com> wrote: >> > >> > There was no cycle for this crash. >> >> Oh, yeah, didn't mean to imply there were - but that a system designed >> to prevent cycles might also be used/help prevent redundant work like >> this. >> >> > What happened is that an exponential runtime is reduced to a linear >> > runtime. Without this revision, ODR hashing would have worked if the >> > machine had enough memory and the user waited long enough. >> > >> > void foo(int a, int b) {} >> > When computing the ODR hash for function foo, it will visit the type int >> > twice, once per parameter. In general, re-visiting types shouldn't be a >> > problem, and in most cases, should be pretty fast. >> >> It does mean some potentially problematic worst-case situations where >> non-trivial types are mentioned more than once (eg: if, instead of >> int, it was a complex struct type - it wouldn't cycle, but it would do >> all that work twice (or many more times if it appears in more places >> in the entity being hashed) > > > See below in the answer to DWARF. ODRHash did have a system, it worked for a > while until it didn't, and was since removed. >> >> >> > class S { >> > void bar(S* s); >> > }; >> > There's actually two ways to visit the Decl behind S, >> > ODR::AddCXXRecordDecl and ODR::AddDecl. When computing the ODR hash of S, >> > ODR::AddCXXRecordDecl is used for a deep dive into the AST of S. When >> > reaching S another way, (via FunctionDecl bar, parameter s, PointerType >> > S*, RecordType S), then the CXXRecordDecl gets processed through >> > ODR::AddDecl, which only processes enough information to identify S, but >> > not any of its deeper details. This allows self-reference without >> > introducing cycles. >> >> Ah, OK - specifically to break the cycle. >> >> So the ODR hash of the function "void f(S*)" doesn't hash the >> implementation of S, (it uses AddDecl, not AddCXXRecordDecl)? But if >> it were "void f(S)" it would hash S? What about a member function that >> takes a parameter by value? ("struct S { void bar(S); }") > > > The three functions AddCXXRecordDecl, AddFunctionDecl, and AddEnumDecl are > the entry points from outside to use the ODRHash and nothing inside ODRHash > will call these functions. That means hashing "class S {};" > AddCXXRecordDecl is called with S. Every other example, "void f(S)", "void > bar(S);", etc will be called into AddDecl. The next question is probably, > how do you know if two functions "void f(S)" in two files refer to same class > S? The answer is, ODRHash doesn't know and doesn't care. But when Clang > imports both "void f(S)" functions, it will also import both S classes. > Since Clang checks, ODRHash doesn't need to.
Clang checks this by entering ODRHash separately (via AddCXXRecordDecl). So it's sort of an inductive proof based on the assumptions of the ODR - >"void f(S)" is ODR compliant if S is ODR compliant<, etc. That makes sense, though still does mean potentially a lot of redundant work in cases like this bug, but never circular work - because anonymous types can't form cycles (because they have no name to do that). I think it still /might/ be worth using a type stack/hash so there's never redundant work, even if it's non-circular work. >> >> >> > I think it would be possible to add some checks in debug mode to catch >> > cycles. I'm not sure it can detect redundant work as the function foo >> > example above shows that visiting the same types over multiple times is >> > expected. >> >> Both for efficiency and to avoid these cycles, it might be worthwhile >> to consider a different way to resolve this issue >> >> The reason these ideas come to my mind is that DWARF has a type hash >> that works in a different way to avoid cycles and redundant work. >> >> http://dwarfstd.org/doc/DWARF5.pdf - 7.32, Type Signature Computation. >> It works by assigning every type a number when it's first encountered >> (even before its contents are hashed), and if it's ever encountered >> again, hash the number again rather than going back into hashing the >> implementation. >> > Originally, ODR hashing did have a system similar to what DWARF had. > Relevant portions of 7.32 are 1, 4.a, and 4.b. Basically, maintain a list of > Type's, when first visiting a Type, add it to the list and process it, and if > the Type is ever seen again, use the index number instead reprocessing. > Worked well, and then the AST had a small change in it where now we needed > two different Type's to hash to the same thing. > https://reviews.llvm.org/rL335853 ripped this out. It's possible to replace > it, but it needs to be better than what we currently have. Ah - I think I understand/see. The uniqueness of Type*s is based on the profile hashing and it varies in interesting ways? That seems vaguely disconcerting - Hmm, in r335853, could you walk me through how the Type* DenseMap broke the ODR type hashing? - Dave > >> >> This way no type is hashed more than once, avoiding cycles and redundant >> work. >> >> > >> > >> > >> > From: David Blaikie <dblai...@gmail.com> >> > Date: Sat, May 4, 2019 at 9:06 AM >> > To: Richard Trieu >> > Cc: cfe-commits >> > >> >> Does the ODR hashing have some sort of cycle breaking infrastructure - >> >> so that if the same type is seen more than once (eg: classes have >> >> members that have pointers back to the outer class type, etc) they >> >> don't cause indefinite cycles? Should that infrastructure have caught >> >> these cases & avoided the redundant work? >> >> >> >> I'm curious to understand better how these things work/overlap/or don't. >> >> >> >> On Fri, May 3, 2019 at 9:20 PM Richard Trieu via cfe-commits >> >> <cfe-commits@lists.llvm.org> wrote: >> >> > >> >> > Author: rtrieu >> >> > Date: Fri May 3 21:22:33 2019 >> >> > New Revision: 359960 >> >> > >> >> > URL: http://llvm.org/viewvc/llvm-project?rev=359960&view=rev >> >> > Log: >> >> > Reduce amount of work ODR hashing does. >> >> > >> >> > When a FunctionProtoType is in the original type in a DecayedType, the >> >> > decayed >> >> > type is a PointerType which points back the original FunctionProtoType. >> >> > The >> >> > visitor for ODRHashing will attempt to process both Type's, doing >> >> > double work. >> >> > By chaining together multiple DecayedType's and FunctionProtoType's, >> >> > this would >> >> > result in 2^N Type's visited only N DecayedType's and N >> >> > FunctionProtoType's >> >> > exsit. Another bug where VisitDecayedType and VisitAdjustedType did >> >> > redundant work doubled the work at each level, giving 4^N Type's >> >> > visited. This >> >> > patch removed the double work and detects when a FunctionProtoType >> >> > decays to >> >> > itself to only check the Type once. This lowers the exponential >> >> > runtime to >> >> > linear runtime. Fixes https://bugs.llvm.org/show_bug.cgi?id=41625 >> >> > >> >> > Modified: >> >> > cfe/trunk/lib/AST/ODRHash.cpp >> >> > cfe/trunk/test/Modules/odr_hash.cpp >> >> > >> >> > Modified: cfe/trunk/lib/AST/ODRHash.cpp >> >> > URL: >> >> > http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/AST/ODRHash.cpp?rev=359960&r1=359959&r2=359960&view=diff >> >> > ============================================================================== >> >> > --- cfe/trunk/lib/AST/ODRHash.cpp (original) >> >> > +++ cfe/trunk/lib/AST/ODRHash.cpp Fri May 3 21:22:33 2019 >> >> > @@ -703,14 +703,36 @@ public: >> >> > void VisitType(const Type *T) {} >> >> > >> >> > void VisitAdjustedType(const AdjustedType *T) { >> >> > - AddQualType(T->getOriginalType()); >> >> > - AddQualType(T->getAdjustedType()); >> >> > + QualType Original = T->getOriginalType(); >> >> > + QualType Adjusted = T->getAdjustedType(); >> >> > + >> >> > + // The original type and pointee type can be the same, as in the >> >> > case of >> >> > + // function pointers decaying to themselves. Set a bool and only >> >> > process >> >> > + // the type once, to prevent doubling the work. >> >> > + SplitQualType split = Adjusted.split(); >> >> > + if (auto Pointer = dyn_cast<PointerType>(split.Ty)) { >> >> > + if (Pointer->getPointeeType() == Original) { >> >> > + Hash.AddBoolean(true); >> >> > + ID.AddInteger(split.Quals.getAsOpaqueValue()); >> >> > + AddQualType(Original); >> >> > + VisitType(T); >> >> > + return; >> >> > + } >> >> > + } >> >> > + >> >> > + // The original type and pointee type are different, such as in >> >> > the case >> >> > + // of a array decaying to an element pointer. Set a bool to false >> >> > and >> >> > + // process both types. >> >> > + Hash.AddBoolean(false); >> >> > + AddQualType(Original); >> >> > + AddQualType(Adjusted); >> >> > + >> >> > VisitType(T); >> >> > } >> >> > >> >> > void VisitDecayedType(const DecayedType *T) { >> >> > - AddQualType(T->getDecayedType()); >> >> > - AddQualType(T->getPointeeType()); >> >> > + // getDecayedType and getPointeeType are derived from >> >> > getAdjustedType >> >> > + // and don't need to be separately processed. >> >> > VisitAdjustedType(T); >> >> > } >> >> > >> >> > >> >> > Modified: cfe/trunk/test/Modules/odr_hash.cpp >> >> > URL: >> >> > http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Modules/odr_hash.cpp?rev=359960&r1=359959&r2=359960&view=diff >> >> > ============================================================================== >> >> > --- cfe/trunk/test/Modules/odr_hash.cpp (original) >> >> > +++ cfe/trunk/test/Modules/odr_hash.cpp Fri May 3 21:22:33 2019 >> >> > @@ -4587,6 +4587,43 @@ int num = bar(); >> >> > #endif >> >> > } >> >> > >> >> > +namespace FunctionProtoTypeDecay { >> >> > +#if defined(FIRST) >> >> > +struct S1 { >> >> > + struct X {}; >> >> > + using Y = X(X()); >> >> > +}; >> >> > +#elif defined(SECOND) >> >> > +struct S1 { >> >> > + struct X {}; >> >> > + using Y = X(X(X())); >> >> > +}; >> >> > +#else >> >> > +S1 s1; >> >> > +// expected-error@first.h:* {{'FunctionProtoTypeDecay::S1::Y' from >> >> > module 'FirstModule' is not present in definition of >> >> > 'FunctionProtoTypeDecay::S1' in module 'SecondModule'}} >> >> > +// expected-note@second.h:* {{declaration of 'Y' does not match}} >> >> > +#endif >> >> > + >> >> > +#if defined(FIRST) >> >> > +struct S2 { >> >> > + struct X {}; >> >> > + using Y = >> >> > + X(X(X(X(X(X(X(X(X(X(X(X(X(X(X(X( >> >> > + X(X(X(X(X(X(X(X(X(X(X(X(X(X(X(X( >> >> > + X(X(X(X(X(X(X(X(X(X(X(X(X(X(X(X( >> >> > + X(X(X(X(X(X(X(X(X(X(X(X(X(X(X(X( >> >> > + )))))))))))))))) >> >> > + )))))))))))))))) >> >> > + )))))))))))))))) >> >> > + )))))))))))))))); >> >> > +}; >> >> > +#elif defined(SECOND) >> >> > +#else >> >> > +S2 s2; >> >> > +#endif >> >> > + >> >> > +} >> >> > + >> >> > // Keep macros contained to one file. >> >> > #ifdef FIRST >> >> > #undef FIRST >> >> > >> >> > >> >> > _______________________________________________ >> >> > cfe-commits mailing list >> >> > cfe-commits@lists.llvm.org >> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits