Re: LTO inlining of transactional builtins

Aldy Hernandez Mon, 25 Jun 2012 07:36:27 -0700

a) If a user provides a builtin implementation to LTO, it is discarded,
since by design LTO prefers builtins to user-provided versions of them.  In
LTO, builtins are their own prevailing decl.  There is an enhancement
request PR here:


        http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51997


It definitely should be the other way around and builtins should get their
proper entry in the now existent symbol table.


Well, the way we stream builtin decls as special cases is indeed weird.  I 
recall
I once tried to remove that code and it lead to some regressions, but in general
it should no tbe neccesary.

Yes, we seem to special-case builtins all over the place. I have akludge disabling this, just to work on (b) below.

b) LTO streaming happens before TMMARK.  Since the TMMARK pass is the one
that instruments memory operations into __builtin_ITM_* calls, even if (a)
was fixed, LTRANS would have nothing to inline.


Which also means that this has nothing to do with LTO per-se, just that
you'd need LTO to see the bodies of the "builtins".  Use a small C testcase
where you provide the implementation of one of the builtins (well, the one
you end up using) and face the same issue.

Do I understand correctly that inlining the builtin at expansion time is not
good because the implementation detail may depend on how libitm was
configured?

Unfortunately, the tmmark pass can't be moved earlier, because the point is
to delay its work so memory and loop optimizations can do its thing before
memory operations are irreconcilably transformed into function calls.


This is the main problem however. As Richi pointed out, even in C this won't 
work.
We decide inlining at WPA time and since then no inlining is possible and all 
unreachable
functions are removed. So when you invent new calls to builtins on the way
you can't expect them to be resonably inlinined.

Yes, I have been playing with marking any such provided builtins withcgraph_mark_force_output_node() in the IPA-tm pass. I assume thatanyone linking with implementations of the TM builtins must either wantthem inlined, or want them in the final link. But your idea of a newinline attribute is cleaner and far more generic.

Also you would not have the TM builtin bodies available in your ltrans unit
because nothing calls them.  So anything that requires LTO (to see the
bodies in the first place) but does not expose the calls before LTO bytecode
output is not going to work.


Marking with cgraph_mark_force_output_node() in the IPA-tm pass fixes this.

Well, only way I see here is to

a) have special purpose local inlining pass to handle these newly born bultins.
Basically you can re-purpose early inliner for this and run it after your pass
(and we can generalize the machinery for other kind of beasts if needed)
The early inliner fits better for this than late inliner.

Yes, this is what I've been doing, but I paused for yall's input when Ihad to either rematerialize the gimple bodies, or keep the gimpleoptimizations from removing them as each function got compiled.


b) introduce new kind of functions that are those builtins.  You need
Sort of combination of always_inline, extern and used attributes but not quite.
The new kind of function must
    1) make partitioner to ship the functions into every partition,
    2) make unreachable function removal to not remove them even if they seem 
useless,
    3) make code generation to never produce offline copies of them even if they
       are not removed by the unreachable function pass.
    4) make the final check happy that this type of function may be kept in 
memory
       till end of compilation.

If this seems neccesary I can implement this for you, but I am always hesitant
to add a new type of function into the machinery - we already face the 
complexity
of having quite few of them.

I would be delighted if you could work on this, if you think a moregeneral solution to just forcing the node to be outputted is necessary.But first let's get rth's input, because I'm still unsure whether thepayoff for inlining so late is sufficient to merit all this work.


Aldy

Re: LTO inlining of transactional builtins

Reply via email to