> It was.  Unfortunately,work on it stopped last year and it is unlikely
> that I will be assigned to this again.  I still have some personal
> interest on the feature, but given time restrictions, we should make
> contingency plans.
> 
> Perhaps the easiest option is to remove the feature.  WHOPR does not
> represent a lot of code over the basic LTO framework, so this should
> be relatively easy and non-intrusive.

I would preffer completing this feature rather then removing it.  I plan to work
on cleaning up the basic infrastructure (i.e. making LTRANS stage working 
correctly
and really read in what optimizations at WPA stage decided) during this stage1.
(This week my thesis really converged, I should submit in two weeks or so, so
this seems realistic plan ;)

Here is brief outline what I would like to do:

Main problem of current WHOPR implementation is that is is quite broken in a
way WPA->LTRANS streaming works. We don't do anything but inlining at WPA and
we don't stream out result if IP propagation to LTRANS and instead LTRANS re-do
everything locally.

So we need:

  1) Update visibility stuff in callgraph so it works on partial units in LTRANS
     (i.e. we should know that some function body was available, is in other 
LTRANS
      unit and we do have following summary knowledge about (such as IPA-PTA 
sets)
      that are useful for compilation of current unit)
  2) Fix WPA partitioning versus to clones, it is completely broken
  3) Make callgraph streamer to stream callgraph with results of IPA 
optimization.
     That is we need, for example, to stream inline_failed reasons from 
     WPA->LTRANS, but we don't need to stream it in LTO.  At the moment
     we stream it always.
     On the other hand we never stream info about clones that we must stream in
     WPA->LTRANS too.
  4) Fix passmanager.  The stuff should work as follows:

     Compilation:
       Early optimization, analysis of all IPA passes
     WPA
       Propagation of all IPA passes
     Ltrans
       Transofrmation of all IPA passes
       rest of compilation

     Currently we do something like this:
     Compilation:
       Early optimization, analysis of all IPA passes
     WPA
       Propagation for inliner alone
     Ltrans
       Throw away inlining decisions made
       Do analysis, propagation and transformation of all IPA pases
       rest of compilation

     As we have read/write methods for summaries, we need read/write methods 
for optimization
     summaries and we need some PM cleanups to integrate it better with 
compilation process.
   5) To get some benefits we need to get back the parallel compilation of 
ltrans etc. etc.

So as  time allows, I would like to work on these items pretty much in this 
order.
Of course I would be happy if someone beats me.
> 
> If we do keep it, my expectation was to convert WHOPR into a
> profile-guided feature, much like what LIPO does.  Instead of trying
> to statically decide what TUs to compile together, we generate gimple
> bytecode for all TUs, then link normally (without combining them) and
> run the binary with profile generation enabled.
> 
> During the second compilation, we use the profile generated to decide
> what TUs to link together.  This avoids the complexity we currently
> have with WHOPR, which needs to re-exec itself and make static
> decisions.

Well, I think this is independent.
It makes a lot of sense to make profiling to work in a way so instrumentation
happens at linktime with LTO and we can read stuff back.  This is relatively
easy to do: we need to rewrite profiling pass to work on SSA (that is easy and
desirable anyway and on my TODO for a while).  Then we need to split gcov and
-fprofile-generate profiling passes.  Gcov needs to happen early since early
optimization will optimize out dead code and we want to count it.
Profile generation needs to be done late since we want it at linktime.

Then we will be able to do profiling at LTO.

With WHOPR I expect situation to be bit funnier, since one does not have
the function bodies and thus it is difficult to read in the profile,
get callgraph profile and do something about it.

I guess one solution is to add callgraph profiling into our instrumentation or
make it possible to read CFG of each function at WHOPR stage and do profile
read.  Didn't had much time to think this over.
> 
> Additionally, removing the current WHOPR code will not affect that plan.
> 
> The first target I would shoot for, however, is to replace -combine with 
> -flto.

:) We need debug info and hammer out all bugs of course!  I would also like to
see possiblity to LTO bootstrap without gold and possibility to not generate
assembly into LTO .o files.  In the typical use where one builds app with LTO
(such as bootstrap) this just slow down everything approximately twice that is
IMO silly.

Honza
> 
> 
> Diego.

Reply via email to