Re: Use ODR for canonical types construction in LTO

Jan Hubicka Fri, 14 Jun 2019 04:24:12 -0700

> > Step 2 remain same.
> > Additional step 3 registers all odr derived types into canonical type hash.
> > This requires canonical type hash to play well with ODR types (i.e. not
> > consider them equivalent based on structural equivalety).
> > 
> > The decision on whether given type has ODR based canonical
> > type is stored in odr_based_tbaa_p and is used
> >  1) by gimple_canonical_types_compatible_p so the comparsion is
> >     behaving well after ODR types are inserted into the hash
> >  2) by alias.c to be build more precise alias sets - for ODR types
> >     it is not necessary to glob pointers into void *.
> > There is no need to update canonical type hash since I simply insert
> > hash codes of ODR names into canonical type hash cache.
> 
> ICK.  Can you please solve the C++ issue differently?  The patch
> also seems to do many other things ...


What do you mean by C++ issue here?  I do not think it is a problem of
C++, but rather how type system work.  It has named
structures/enums/unions and unnamed.  The unnamed ones needs to be
handled same way as canonical type hasing.  For named ones we want to
use type names unless we decide to unify them with types from other
languages that does not do ODR.

So first we figure out what clashes we have and then we insert remaining
types.
> > +   /* LTO non-ODR type merging does not make any difference between 
> > +      component pointer types.  We may have
> > +
> > +      struct foo {int *a;};
> > +
> > +      as TYPE_CANONICAL of 
> > +
> > +      struct bar {float *a;};
> > +
> > +      Because accesses to int * and float * do not alias, we would get
> > +      false negative when accessing the same memory location by
> > +      float ** and bar *. We thus record the canonical type as:
> > +
> > +      struct {void *a;};
> > +
> > +      void * is special cased and works as a universal pointer type.
> > +      Accesses to it conflicts with accesses to any other pointer
> > +      type.  */
> > +   bool void_pointers = in_lto_p
> > +                        && (!odr_type_p (type)
> > +                            || !odr_based_tbaa_p (type));
> 
> This seems like an independent improvement to me.

Kind of - the way we glob pointers here must match way we glob them
while deciding on canonical types (or we may just do subtypes walk for
every type merged into canonical type to get better info).

So w/o the rest of patch this would be wrong.  But I can do this
incrementaly - first refine canonical types and then refine aliasing
DAG.
> > Index: ipa-devirt.c
> > ===================================================================
> > --- ipa-devirt.c    (revision 271843)
> > +++ ipa-devirt.c    (working copy)
> > @@ -213,6 +213,8 @@ struct GTY(()) odr_type_d
> >    bool odr_violated;
> >    /* Set when virtual table without RTTI previaled table with.  */
> >    bool rtti_broken;
> > +  /* Set when we need to give up on ODR based TBAA info.  */
> > +  bool tbaa_enabled;
> 
> ?

I will fix the comment, used to have disabled flag.
This is about handling structurally equivalent ODR types.
> > @@ -332,10 +337,11 @@ hash_canonical_type (tree type)
> >    return hstate.end();
> >  }
> >  
> > -/* Returning a hash value for gimple type TYPE combined with VAL.  */
> > +/* Returning a hash value for gimple type TYPE combined with VAL.
> > +   If INSERT is true possibly insert TYPE to canonical type hash.  */
> >  
> >  static void
> > -iterative_hash_canonical_type (tree type, inchash::hash &hstate)
> > +iterative_hash_canonical_type (tree type, inchash::hash &hstate, bool 
> > insert)
> >  {
> >    hashval_t v;
> >  
> > @@ -343,7 +349,7 @@ iterative_hash_canonical_type (tree type
> >    type = TYPE_MAIN_VARIANT (type);
> >  
> >    if (!canonical_type_used_p (type))
> > -    v = hash_canonical_type (type);
> > +    v = hash_canonical_type (type, insert);
> >    /* An already processed type.  */
> >    else if (TYPE_CANONICAL (type))
> >      {
> > @@ -355,9 +361,11 @@ iterative_hash_canonical_type (tree type
> >        /* Canonical types should not be able to form SCCs by design, this
> >      recursion is just because we do not register canonical types in
> >      optimal order.  To avoid quadratic behavior also register the
> > -    type here.  */
> > -      v = hash_canonical_type (type);
> > -      gimple_register_canonical_type_1 (type, v);
> > +    type here.  Do not assign canonical types to ODR types - this
> > +    is done later using ODR name hash.  */
> > +      v = hash_canonical_type (type, insert);
> > +      if (insert)
> > +        gimple_register_canonical_type_1 (type, v);
> 
> So the comment explains that gimple_register_canonical_type_1 is to
> avoid quadratic behavior for types referenced multiple times.
> You remove this and I see why but I fear this will cause issues.

I was benmarking this on firefox.  Thing is that the hash is not very
large and we do not have that many types even for large programs, so
walking it multiple times does not seem much of issue.

I can introduce cache here that will put computed hash values when
insert is false and also look it up.  In a way perhaps this is better to
do if we actually get performance problem.  But can do it proactively.
> > +   /* Punt on ODR violations - if there are structurally different
> > +      types of the same name we are better to not try too hard to
> > +      use TBAA.  */
> > +   if (odr_type_violation_reported_p (t))
> > +     {
> > +       if (symtab->dump_file)
> > +         {
> > +           fprintf (symtab->dump_file,
> > +                    "Disabling ODR TBAA because of ODR violation: ");
> > +           print_generic_expr (symtab->dump_file, t);
> > +           fprintf (symtab->dump_file, "\n");
> > +         }
> 
> But you don't actually do anything to the type?

Yes, but not setting the canonical type the last loop will process it as
if it was unnamed type and use structural hash.
> > +           /* Populate canonical type hash with type name.  */
> > +           bool existed = canonical_type_hash_cache->put
> > +                            (prevail,
> > +                             htab_hash_string
> > +                               (IDENTIFIER_POINTER
> > +                                  (DECL_ASSEMBLER_NAME
> > +                                    (TYPE_NAME (prevail)))));
> 
> but referencing types used a different hash value for this type?
> (computed multiple times, see that quadraticness issue)

I never insert ODR derived types into the hash during streaming in,
so this is safe (as described in the comment). During the last pass I
add them bassed on the hash value I computed here.
> 
> Why use a different hash value and why not insert the hash in
> the cache during first processing?

We could get more conflicts then, but it is probalby not important - it
just seemed that given we have easy way to hash names well, we should
do it :)
> 
> > +           gcc_assert (!existed);
> > +           /* Set TYPE_CANONICAL for all variants and duplicates of T
> > +              including incompete ones.  */
> > +           set_type_canonical_for_odr_type (t, prevail);
> > +           /* And inform gimple_canonical_types_compatible_p that
> > +              it should rely on TYPE_CANONICAL compares only.  */
> > +           enable_odr_based_tbaa (t);
> > +           gcc_checking_assert (TYPE_CANONICAL (t) = prevail);
> > +         }
> > +     }
> > +      }
> > +
> > +  /* Now compute canonical types for all ODR derived types.  */
> > +  FOR_EACH_VEC_ELT (*types_to_register, i, t)
> > +    if (!TYPE_CANONICAL (t))
> > +      {
> > +        gimple_register_canonical_type (t);
> 
> note this will re-compute the hash value the old way.

Yes, t is type referring to ODR types but not having name by itself (or
we decided to give up because of ODR violation).
We will compute has now but because the hash values of types with
canonical types are hashed, we will use ODR names for ODR types.

This way we merge i.e.

struct a {int b};
struct {struct a c,d;};

and

struct a {int b};
struct {struct a c,d;};

from two different units under assumption that for some reason type
merging did not merge "struct a" and "struct a" for whatever reason (like
difference in virtual table).

The first type is ODR type so it will be
added into hash after verifying that there are no reasons to give up
and in this loop we will structural compare two copies of

struct {struct a c,d;};
struct {struct a c,d;};

assuming type merging failed earlier and give them same canoincal type.

> > +/* TYPE is expected to be record or union.  Figure out if it is an ODR
> > +   type or derived from it.  This is intended to be used only from
> > +   stream in process and rely on fact that all types with TYPE_CANONICAL
> > +   set are not ODR derived.  This reducts most of recursive lookups.  */
> 
> Still don't understand this issue fully but I don't really like this...
> iff then the FE should set this and we should stream it as extra bit.

Hope the above example makes it cleaner.  We could stream extra bit to
differentiate C++
struct {struct a c,d;};

form C

struct {struct a c,d;};

Since first is referring ODR type struct a, while second is not.

actually I also think it would be cleaner.  I am not sure where to put
it in though.
> >  
> > +  /* For types where we do ODR based TBAA the canonical type is always
> > +     set correctly, so we know that types are different if their
> > +     canonical types does not match.  */
> > +  if (trust_type_canonical
> > +      && (odr_type_p (t1) && odr_based_tbaa_p (t1))
> > +     != (odr_type_p (t2) && odr_based_tbaa_p (t2)))
> > +    return false;
> > +
> 
> ?  But then TYPE_CANONICAL is properly set and thus the code above
> catched it?  It looks like this is for the transitional case where
> TYPE_CANONICAL was delayed?  If so it doesn't really belong here
> but in the (hopefully single...) caller that matters?

Consider
struct a {int a;}

struct b {struct a a;}
struct {struct a a;}

Now "struct b" will have TYPE_CANONICAl but "struct <unnamed>" not yet.
We want to figure out that they are not equivalent.

Honza

Re: Use ODR for canonical types construction in LTO

Reply via email to