================ @@ -184,13 +205,59 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type *Ty) { return getChar(); // Handle pointers and references. - // TODO: Implement C++'s type "similarity" and consider dis-"similar" - // pointers distinct. - if (Ty->isPointerType() || Ty->isReferenceType()) - return createScalarTypeNode("any pointer", getChar(), Size); + // + // In C11 for two pointer type to alias it is required for them to be + // compatible [section 6.5 p7]. ---------------- rjmccall wrote:
This comment is pretty good, but it'd be better if we had a good comment on the function itself, and then this could be specifically in the context of that. Please add something like this as a doc comment on the entire function: ``` /// Return an LLVM TBAA metadata node appropriate for an access through /// an l-value of the given type. Type-based alias analysis takes advantage /// of the following rules from the language standards: /// /// C 6.5p7: /// An object shall have its stored value accessed only by an lvalue /// expression that has one of the following types: /// - a type compatible with the effective type of the object, /// - a qualified version of a type compatible with the effective /// type of the object, /// - a type that is the signed or unsigned type corresponding /// to the effective type of the object, /// - a type that is the signed or unsigned type corresponding /// to a qualified version of the effective type of the object, /// - an aggregate or union type that includes one of the /// aforementioned types among its members (including, /// recursively, a member of a subaggregate or contained union), or /// - a character type. /// /// C++ [basic.lval]p11: /// If a program attempts to access the stored value of an object /// through a glvalue whose type is not similar to one of the following /// types the behavior is undefined: /// - the dynamic type of the object, /// - a type that is the signed or unsigned type corresponding /// to the dynamic type of the object, or /// - a char, unsigned char, or std::byte type. /// /// The C and C++ rules about effective/dynamic type are broadly similar /// and permit memory to be reused with a different type. C does not have /// an explicit operation to change the effective type of memory; any store /// can do it. While C++ arguably does have such an operation (the standard /// global `operator new(void*, size_t)`), in practice it is important to /// be just as permissive as C. We therefore treat all stores as being able to /// change the effective type of memory, regardless of language mode. That is, /// loads have both a precondition and a postcondition on the effective /// type of the memory, but stores only have a postcondition. This imposes /// an inherent limitation that TBAA can only be used to reorder loads /// before stores. This is quite restrictive, but we don't have much of a /// choice. In practice, hoisting loads is the most important optimization /// for alias analysis to enable anyway. /// /// Therefore, given a load (and its precondition) and an earlier store /// (and its postcondition), the question posed to TBAA is whether there /// exists a type that is consistent with both accesses. If there isn't, /// it's fine to hoist the load because either the memory is non-overlapping /// or the precondition on the load is wrong (which would be UB). /// /// LLVM TBAA says that two accesses with TBAA metadata nodes may alias if: /// - the metadata nodes are the same, /// - one of the metadata nodes is a base of the other (this can be /// recursive, but it has to be the original node that's a base, /// not just that the nodes have a common base), or /// - one of the metadata nodes is a `tbaa.struct` node (the access /// necessarily being a `memcpy`) with a subobject node that would /// be allowed to alias with the other. /// /// Our job here is to produce metadata nodes that will never say that /// an alias is not allowed when there exists a type that would be consistent /// with the types of the accesses from which the nodes were produced. /// /// The last clause in both language rules permits character types to /// alias objects of any type. We handle this by converting all character /// types (as well as `std::byte` and types with the `mayalias` attribute) /// to a single metadata node (the `char` node), then making sure that /// that node is a base of every other metadata node we generate. /// We can always just conservatively use this node if we aren't otherwise /// sure how to implement the language rules for a type. /// /// Read literally, the C rule for aggregates permits an aggregate l-value /// (e.g. of type `struct { int x; }`) to be used to access an object that /// is not part of an aggregate object of that type (e.g. a local variable /// of type `int`). That case is perhaps sensical, but it would also permit /// e.g. an l-value of type `struct { int x; float f; }` to be used to /// access an object of type `float`, which is nonsense. We interpret this /// clause as just intending to permit objects to be accessed through an /// l-value that properly references a containing object. /// /// C++ does not have an explicit rule for aggregates because in C++ /// a non-member access to an aggregate l-value is always a call to a /// constructor or assignment operator, which then accesses all the /// subobjects. In general, however, our interpretation of member /// accesses is that they are also an access to the containing object /// and therefore require such an object to exist at that address; /// this permits us to just use the C rule for the accesses done by /// trivial copy/move constructors/operators. /// /// Both C and C++ permit some qualification differences. In C, however, /// qualification can only differ at the outermost level, whereas C++ /// allows qualification to differ in nested positions through the /// similar-types rule. This means that e.g. an l-value of type /// `const float *` is not permitted to access an object of type /// `float *` in C, but it is in C++. We use the C++ rule /// unconditionally; the C rule is needlessly strict and frequently /// violated in practice by code that we don't want to say is wrong. /// We implement this by just discarding type qualifiers within pointer-like /// types when deriving TBAA nodes; basically, we produce the TBAA node /// for the type that is unqualified at all the recursive positions /// considered by the C++ similar type rule. The implementation /// doesn't actually construct this recursively-qualified type as a /// `QualType`; it just ignores qualifiers when recursing into types. /// /// The similar-type rule only really applies to the standard CVR /// qualifiers, which never affect representations. Qualifiers such as /// address spaces that may involve a representation difference would /// be totally appropriate to distinguish for TBAA purposes. However, /// the current implementation just discards all qualifiers. /// /// We handle the signed/unsigned clause by just making unsigned types /// use the the metadata node for the signed variant of the type. In the /// language rules, this only applies at the outermost level, and e.g. an /// l-value of type `signed int *` is not permitted to alias an object of /// type `unsigned int *`. We choose not to distinguish those types when /// pointer-type TBAA is enabled, however. /// /// After discarding qualifiers and signedness differences as above, /// the language rules come down to whether the types are compatible /// (in C) or identical (in C++). Even in C, most types are compatible /// only with themselves. The exceptions will be considered in the cases /// below. ``` and then this comment can just be something like this: ``` // When PointerTBAA is disabled, all pointers and references use the same // "any pointer" TBAA node. Otherwise, we generate a type-specific TBAA // node and use the "any pointer" node as its base for compatibility between // TUs with different settings. To implement the C++ similar-type rules // (which we also adopt in C), we need to ignore qualifiers on the // pointee type, and that has to be done recursively if the pointee type // is itself a pointer-like type. // // Currently we ignore the differences between pointer-like types and just // and use this tag for the type: `p<pointer depth> <inner type tag>`. // This means we give e.g. `char **` and `char A::**` the same TBAA tag. ``` https://github.com/llvm/llvm-project/pull/75177 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits