Re: LTO inlining of transactional builtins

Jan Hubicka Fri, 22 Jun 2012 06:28:24 -0700

> On Fri, Jun 22, 2012 at 2:47 PM, Aldy Hernandez <al...@redhat.com> wrote:
> > Hi gentlemen.
> >
> > I am looking again at LTO + TM.  The goal is to be able to link with the
> > implemented _ITM_* functions in libitm.a, and have them inlined into the
> > transaction code when profitable.
> >
> > To refresh everyone's memory, the original problem was two-fold:
> >
> > a) If a user provides a builtin implementation to LTO, it is discarded,
> > since by design LTO prefers builtins to user-provided versions of them.  In
> > LTO, builtins are their own prevailing decl.  There is an enhancement
> > request PR here:
> >
> >        http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51997
> 
> It definitely should be the other way around and builtins should get their
> proper entry in the now existent symbol table.


Well, the way we stream builtin decls as special cases is indeed weird.  I 
recall
I once tried to remove that code and it lead to some regressions, but in general
it should no tbe neccesary.
This however won't solve the problems...
> 
> > b) LTO streaming happens before TMMARK.  Since the TMMARK pass is the one
> > that instruments memory operations into __builtin_ITM_* calls, even if (a)
> > was fixed, LTRANS would have nothing to inline.
> 
> Which also means that this has nothing to do with LTO per-se, just that
> you'd need LTO to see the bodies of the "builtins".  Use a small C testcase
> where you provide the implementation of one of the builtins (well, the one
> you end up using) and face the same issue.
> 
> Do I understand correctly that inlining the builtin at expansion time is not
> good because the implementation detail may depend on how libitm was
> configured?
> 
> > Unfortunately, the tmmark pass can't be moved earlier, because the point is
> > to delay its work so memory and loop optimizations can do its thing before
> > memory operations are irreconcilably transformed into function calls.

This is the main problem however. As Richi pointed out, even in C this won't 
work.
We decide inlining at WPA time and since then no inlining is possible and all 
unreachable
functions are removed. So when you invent new calls to builtins on the way
you can't expect them to be resonably inlinined.
This is problem even i.e. for kernel folks who for years provided their own
implementation of string operations that was fully inline but needed to deal
with places where GCC invented inline call.
> >
> > FYI, the original thread was here:
> >
> >        http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01258.html
> >
> > Now unto my current woes... (I'm concentrating on problem (b) here).
> >
> > My original thought of moving the LTO streaming point after tmmark under
> > certain circumstances is a no go.  If at compile time we were to run tmmark
> > and then stream LTO out, at link/lto time we will do: inline, ipa-tm,
> > optimizations, tmmark.  Unfortunately, IPA-TM will add new TM clones with
> > __builtin_ITM_* calls.  This would mean that new clones don't get TM calls
> > inlined, while the lexical __atomic blocks do.
> >
> > rth and I have been talking about re-running inlining after tmmark
> > specifically for the TM builtins.
> >
> > As you can imagine the pass manager isn't designed to run IPA passes after
> > the regular optimization passes run, and I don't see a generic need for this
> > apart from the TM problem-- although I could be wrong.
> >
> > I tried playing with forcing another run of the early inliner after tmmark,
> > since it is designed as a GIMPLE_PASS, but by pass_all_optimizations time,
> > we have removed cgraph and gimple bodies. Seeing the amount of setup I have
> > to do to re-run early inlining after the gimple optimizations have begun,
> > perhaps I should steer my effort to running proper IPA inlining some time
> > after tmmark.
> 
> Also you would not have the TM builtin bodies available in your ltrans unit
> because nothing calls them.  So anything that requires LTO (to see the
> bodies in the first place) but does not expose the calls before LTO bytecode
> output is not going to work.

Well, only way I see here is to

a) have special purpose local inlining pass to handle these newly born bultins.
Basically you can re-purpose early inliner for this and run it after your pass
(and we can generalize the machinery for other kind of beasts if needed)
The early inliner fits better for this than late inliner.

b) introduce new kind of functions that are those builtins.  You need 
Sort of combination of always_inline, extern and used attributes but not quite.
The new kind of function must
   1) make partitioner to ship the functions into every partition,
   2) make unreachable function removal to not remove them even if they seem 
useless,
   3) make code generation to never produce offline copies of them even if they
      are not removed by the unreachable function pass.
   4) make the final check happy that this type of function may be kept in 
memory
      till end of compilation.

If this seems neccesary I can implement this for you, but I am always hesitant
to add a new type of function into the machinery - we already face the 
complexity
of having quite few of them.

Note that this will still have some ill effects - like if this magic function 
calls normal functoin (say static) this function will end up compiled in every
partition even if unused.

We also used to have before inlining/after inlining mode that were subtly 
different
with EH, but that should be resolved now.

Honza

Re: LTO inlining of transactional builtins

Reply via email to