a) If a user provides a builtin implementation to LTO, it is discarded,
since by design LTO prefers builtins to user-provided versions of them. In
LTO, builtins are their own prevailing decl. There is an enhancement
request PR here:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51997
It definitely should be the other way around and builtins should get their
proper entry in the now existent symbol table.
Well, the way we stream builtin decls as special cases is indeed weird. I
recall
I once tried to remove that code and it lead to some regressions, but in general
it should no tbe neccesary.
Yes, we seem to special-case builtins all over the place. I have a
kludge disabling this, just to work on (b) below.
b) LTO streaming happens before TMMARK. Since the TMMARK pass is the one
that instruments memory operations into __builtin_ITM_* calls, even if (a)
was fixed, LTRANS would have nothing to inline.
Which also means that this has nothing to do with LTO per-se, just that
you'd need LTO to see the bodies of the "builtins". Use a small C testcase
where you provide the implementation of one of the builtins (well, the one
you end up using) and face the same issue.
Do I understand correctly that inlining the builtin at expansion time is not
good because the implementation detail may depend on how libitm was
configured?
Unfortunately, the tmmark pass can't be moved earlier, because the point is
to delay its work so memory and loop optimizations can do its thing before
memory operations are irreconcilably transformed into function calls.
This is the main problem however. As Richi pointed out, even in C this won't
work.
We decide inlining at WPA time and since then no inlining is possible and all
unreachable
functions are removed. So when you invent new calls to builtins on the way
you can't expect them to be resonably inlinined.
Yes, I have been playing with marking any such provided builtins with
cgraph_mark_force_output_node() in the IPA-tm pass. I assume that
anyone linking with implementations of the TM builtins must either want
them inlined, or want them in the final link. But your idea of a new
inline attribute is cleaner and far more generic.
Also you would not have the TM builtin bodies available in your ltrans unit
because nothing calls them. So anything that requires LTO (to see the
bodies in the first place) but does not expose the calls before LTO bytecode
output is not going to work.
Marking with cgraph_mark_force_output_node() in the IPA-tm pass fixes this.
Well, only way I see here is to
a) have special purpose local inlining pass to handle these newly born bultins.
Basically you can re-purpose early inliner for this and run it after your pass
(and we can generalize the machinery for other kind of beasts if needed)
The early inliner fits better for this than late inliner.
Yes, this is what I've been doing, but I paused for yall's input when I
had to either rematerialize the gimple bodies, or keep the gimple
optimizations from removing them as each function got compiled.
b) introduce new kind of functions that are those builtins. You need
Sort of combination of always_inline, extern and used attributes but not quite.
The new kind of function must
1) make partitioner to ship the functions into every partition,
2) make unreachable function removal to not remove them even if they seem
useless,
3) make code generation to never produce offline copies of them even if they
are not removed by the unreachable function pass.
4) make the final check happy that this type of function may be kept in
memory
till end of compilation.
If this seems neccesary I can implement this for you, but I am always hesitant
to add a new type of function into the machinery - we already face the
complexity
of having quite few of them.
I would be delighted if you could work on this, if you think a more
general solution to just forcing the node to be outputted is necessary.
But first let's get rth's input, because I'm still unsure whether the
payoff for inlining so late is sufficient to merit all this work.
Aldy