Hi gentlemen.

I am looking again at LTO + TM. The goal is to be able to link with the implemented _ITM_* functions in libitm.a, and have them inlined into the transaction code when profitable.

To refresh everyone's memory, the original problem was two-fold:

a) If a user provides a builtin implementation to LTO, it is discarded, since by design LTO prefers builtins to user-provided versions of them. In LTO, builtins are their own prevailing decl. There is an enhancement request PR here:

        http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51997

b) LTO streaming happens before TMMARK. Since the TMMARK pass is the one that instruments memory operations into __builtin_ITM_* calls, even if (a) was fixed, LTRANS would have nothing to inline.

Unfortunately, the tmmark pass can't be moved earlier, because the point is to delay its work so memory and loop optimizations can do its thing before memory operations are irreconcilably transformed into function calls.

FYI, the original thread was here:

        http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01258.html

Now unto my current woes... (I'm concentrating on problem (b) here).

My original thought of moving the LTO streaming point after tmmark under certain circumstances is a no go. If at compile time we were to run tmmark and then stream LTO out, at link/lto time we will do: inline, ipa-tm, optimizations, tmmark. Unfortunately, IPA-TM will add new TM clones with __builtin_ITM_* calls. This would mean that new clones don't get TM calls inlined, while the lexical __atomic blocks do.

rth and I have been talking about re-running inlining after tmmark specifically for the TM builtins.

As you can imagine the pass manager isn't designed to run IPA passes after the regular optimization passes run, and I don't see a generic need for this apart from the TM problem-- although I could be wrong.

I tried playing with forcing another run of the early inliner after tmmark, since it is designed as a GIMPLE_PASS, but by pass_all_optimizations time, we have removed cgraph and gimple bodies. Seeing the amount of setup I have to do to re-run early inlining after the gimple optimizations have begun, perhaps I should steer my effort to running proper IPA inlining some time after tmmark.

Before I embark on more surgery, I would like your input. I am entertaining the following two options:

a) Have tmmark set up IPA infrastructure for ipa_inline() to
run, and run it directly at the end of the pass (instead of through the pass manager). Ugly, but localized.

b) Modify execute_pass_list() so subpasses can be IPA passes. Set up appropriate infrastructure as in (a), and run this specialized IPA inline (or whatever subpasses we may add in the future). This is converse to ipa_*summaries*() where we run subpasses that are local passes. Generic, but I question whether anyone else will ever need this.

What do you think?  Am I nuts to even consider this?  Other ideas?

BTW, I still question whether even inlining will gain us much, since after tmmark there are few optimizations left to run (except RTL optimizations). So I would guess that any gain from TM builtin inlining will be speed and any benefits RTL optimizations can give us. Still...I'm willing to play along a bit longer...

Thanks.
Aldy

Reply via email to