Re: TYPE_BINFO and canonical types at LTO

Richard Biener Wed, 19 Feb 2014 04:16:55 -0800

On Tue, 18 Feb 2014, Jan Hubicka wrote:

> > > Non-ODR types born from other frontends will then need to be made to 
> > > alias all the ODR variants that can be done by storing them into the 
> > > current canonical type hash.
> > > (I wonder if we want to support cross language aliasing for non-POD?)
> > 
> > Surely for accessing components of non-POD types, no?  Like
> > 
> > class Foo {
> > Foo();
> > int *get_data ();
> > int *data;
> > } glob_foo;
> > 
> > extern "C" int *get_foo_data() { return glob_foo.get_data(); }
> 
> OK, if we want to support this, then we want to merge.
> What about types with vtbl pointer? :)


I can easily create a C struct variant covering that.  Basically
in _practice_ I can inter-operate with any language from C if I
know its ABI.  Do we really want to make this undefined?  See
the (even standard) Fortran - C interoperability spec.  I'm sure
something exists for Ada interoperating with C (or even C++).

> > ?  But you are talking about the "tree" merging part using ODR info
> > to also merge types which differ in completeness of contained
> > pointer types, right?  (exactly equal cases should be already merged)
> 
> Actually I was speaking of canonical types here. I want to preserve more 
> of TBAA via honoring ODR and local types.

So, are you positive there will be a net gain in optimization when
doing that?  Please factor in the surprises you'll get when code
gets "miscompiled" because of "slight" ODR violations or interoperability
that no longer works.

> I want to change lto to not 
> merge canonical types for pairs of types of same layout (i.e. equivalent 
> in the current canonical type definition) but with different mangled 
> names.

Names are nothing ;)  In C I very often see different _names_ used
in headers vs. implementation (when the implementation uses a different
internal header).  You have struct Foo; in public headers vs.
struct Foo_impl; in the implementation.

> I also want it to never merge when types are local. For 
> inter-language TBAA we will need to ensure aliasing in between non-ODR 
> type of same layout and all unmerged variants of ODR type.
>  Can it be 
> done by attaching chains of ODR types into the canonical type hash and 
> when non-ODR type appears, just make it alias with all of them?

No, how would that work?

> It would make sense to ODR merge in tree merging, too, but I am not sure if
> this fits the current design, since you would need to merge SCC components of
> different shape then that seems hard, right?

Right.  You'd lose the nice incremental SCC merging (where we haven't even
yet implemented the nicest way - avoid re-materializing the SCC until
we know it prevails).

> It may be easier to ODR merge after streaming (during DECL fixup) just to make
> WPA streaming cheaper and to reduce debug info size.  If you use
> -fdump-ipa-devirt, it will dump you ODR types that did not get merged (only
> ones with vtable pointers in them ATM) and there are quite long chains for
> firefox. Surely then hundreds of duplicated ODR types will end up in the 
> ltrans
> partition streams and they eventually hit debug output machinery.
> Eric sent me presentation about doing this in LLVM.
> http://llvm.org/devmtg/2013-11/slides/Christopher-DebugInfo.pdf

Debuginfo is sth completely separate and should be done separately
(early debug), avoiding to stream the types in the first place.

> > 
> > The canonical type computation happens separately (only for prevailing
> > types, of course), and there we already "merge" types which differ
> > in completeness.  Canonical type merging is conservative the other
> > way aroud - if we merge _all_ types to a single canonical type then
> > TBAA is still correct (we get a single alias set).
> 
> Yes, I think I understand that. One equivalence is kind of minimal so we merge
> only if we are sure there is no informationloss, other is maximal so we are
> sure that types that needs to be equivalent by whatever underlying langauge
> TBAA rules are actually equivalent.

The former is just not correct - it would mean that not merging at all
would be valid, which it is not (you'd create wrong-code all over the 
place).

We still don't merge enough (because of latent bugs that I didn't manage
to fix in time) - thus we do not merge all structurally equivalent types
right now.

> > > I also think we want explicit representation of types known to be local 
> > > to compilation unit - anonymous namespaces in C/C++, types defined 
> > > within function bodies in C and god knows what in Ada/Fortran/Java.
> > 
> > But here you get into the idea of improving TBAA, thus having
> > _more_ distinct canonical types?
> 
> Yes.
> > 
> > Just to make sure to not mix those two ;)
> > 
> > And whatever "frontend knowledge" we want to excercise - please
> > make sure we get a reliable way for the middle-end to see
> > that "frontend knowledge" (no langhooks!).  Thus, make it
> > "middle-end knowledge".
> 
> Sure that is what I am proposing - just have DECL_ASSEMBLER_NAME on TYPE_DECL
> and ODR flag. Middle-end when comparing types will test ODR flag and if flag
> is set, then it will compare via DECL_ASEBMLER_NAME (TYPE_DECL (type)).
> No langhooks needed here + if other language has similar inter-unit 
> equivalency
> it can use the same mechanizm. Just turn the equivalency description into
> string identifiers.

Ok.  You have to be aware of the effects on inter-language 
interoperability though (you'll break it).  Thus I'd make this
guarded by -fextra-strict-aliasing and only auto-enable that when
all TUs are produced by the same frontend (easy enough to check I guess).

Richard.

> > Oh - and the easiest way to improve things is to get less types into
> > the merging process in the first place!
> 
> Yep, my experiments with not streaming BINFO are directed in it.  I will 
> collect
> some numbers and send.
> 
> Honza
> > 
> > Richard.

Re: TYPE_BINFO and canonical types at LTO

Reply via email to