Hi gentlemen.
I am looking again at LTO + TM. The goal is to be able to link with the
implemented _ITM_* functions in libitm.a, and have them inlined into the
transaction code when profitable.
To refresh everyone's memory, the original problem was two-fold:
a) If a user provides a builtin implementation to LTO, it is discarded,
since by design LTO prefers builtins to user-provided versions of them.
In LTO, builtins are their own prevailing decl. There is an
enhancement request PR here:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51997
b) LTO streaming happens before TMMARK. Since the TMMARK pass is the
one that instruments memory operations into __builtin_ITM_* calls, even
if (a) was fixed, LTRANS would have nothing to inline.
Unfortunately, the tmmark pass can't be moved earlier, because the point
is to delay its work so memory and loop optimizations can do its thing
before memory operations are irreconcilably transformed into function calls.
FYI, the original thread was here:
http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01258.html
Now unto my current woes... (I'm concentrating on problem (b) here).
My original thought of moving the LTO streaming point after tmmark under
certain circumstances is a no go. If at compile time we were to run
tmmark and then stream LTO out, at link/lto time we will do: inline,
ipa-tm, optimizations, tmmark. Unfortunately, IPA-TM will add new TM
clones with __builtin_ITM_* calls. This would mean that new clones
don't get TM calls inlined, while the lexical __atomic blocks do.
rth and I have been talking about re-running inlining after tmmark
specifically for the TM builtins.
As you can imagine the pass manager isn't designed to run IPA passes
after the regular optimization passes run, and I don't see a generic
need for this apart from the TM problem-- although I could be wrong.
I tried playing with forcing another run of the early inliner after
tmmark, since it is designed as a GIMPLE_PASS, but by
pass_all_optimizations time, we have removed cgraph and gimple bodies.
Seeing the amount of setup I have to do to re-run early inlining after
the gimple optimizations have begun, perhaps I should steer my effort to
running proper IPA inlining some time after tmmark.
Before I embark on more surgery, I would like your input. I am
entertaining the following two options:
a) Have tmmark set up IPA infrastructure for ipa_inline() to
run, and run it directly at the end of the pass (instead of through the
pass manager). Ugly, but localized.
b) Modify execute_pass_list() so subpasses can be IPA passes. Set up
appropriate infrastructure as in (a), and run this specialized IPA
inline (or whatever subpasses we may add in the future). This is
converse to ipa_*summaries*() where we run subpasses that are local
passes. Generic, but I question whether anyone else will ever need this.
What do you think? Am I nuts to even consider this? Other ideas?
BTW, I still question whether even inlining will gain us much, since
after tmmark there are few optimizations left to run (except RTL
optimizations). So I would guess that any gain from TM builtin inlining
will be speed and any benefits RTL optimizations can give us.
Still...I'm willing to play along a bit longer...
Thanks.
Aldy