LTO inlining of transactional builtins

Aldy Hernandez Fri, 22 Jun 2012 05:47:32 -0700

Hi gentlemen.

I am looking again at LTO + TM. The goal is to be able to link with theimplemented _ITM_* functions in libitm.a, and have them inlined into thetransaction code when profitable.


To refresh everyone's memory, the original problem was two-fold:

a) If a user provides a builtin implementation to LTO, it is discarded,since by design LTO prefers builtins to user-provided versions of them.In LTO, builtins are their own prevailing decl. There is anenhancement request PR here:


        http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51997

b) LTO streaming happens before TMMARK. Since the TMMARK pass is theone that instruments memory operations into __builtin_ITM_* calls, evenif (a) was fixed, LTRANS would have nothing to inline.

Unfortunately, the tmmark pass can't be moved earlier, because the pointis to delay its work so memory and loop optimizations can do its thingbefore memory operations are irreconcilably transformed into function calls.


FYI, the original thread was here:

        http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01258.html

Now unto my current woes... (I'm concentrating on problem (b) here).

My original thought of moving the LTO streaming point after tmmark undercertain circumstances is a no go. If at compile time we were to runtmmark and then stream LTO out, at link/lto time we will do: inline,ipa-tm, optimizations, tmmark. Unfortunately, IPA-TM will add new TMclones with __builtin_ITM_* calls. This would mean that new clonesdon't get TM calls inlined, while the lexical __atomic blocks do.

rth and I have been talking about re-running inlining after tmmarkspecifically for the TM builtins.

As you can imagine the pass manager isn't designed to run IPA passesafter the regular optimization passes run, and I don't see a genericneed for this apart from the TM problem-- although I could be wrong.

I tried playing with forcing another run of the early inliner aftertmmark, since it is designed as a GIMPLE_PASS, but bypass_all_optimizations time, we have removed cgraph and gimple bodies.Seeing the amount of setup I have to do to re-run early inlining afterthe gimple optimizations have begun, perhaps I should steer my effort torunning proper IPA inlining some time after tmmark.

Before I embark on more surgery, I would like your input. I amentertaining the following two options:


a) Have tmmark set up IPA infrastructure for ipa_inline() to

run, and run it directly at the end of the pass (instead of through thepass manager). Ugly, but localized.

b) Modify execute_pass_list() so subpasses can be IPA passes. Set upappropriate infrastructure as in (a), and run this specialized IPAinline (or whatever subpasses we may add in the future). This isconverse to ipa_*summaries*() where we run subpasses that are localpasses. Generic, but I question whether anyone else will ever need this.


What do you think?  Am I nuts to even consider this?  Other ideas?

BTW, I still question whether even inlining will gain us much, sinceafter tmmark there are few optimizations left to run (except RTLoptimizations). So I would guess that any gain from TM builtin inliningwill be speed and any benefits RTL optimizations can give us.Still...I'm willing to play along a bit longer...


Thanks.
Aldy

LTO inlining of transactional builtins

Reply via email to