Re: API for callgraph and IPA passes for whole program optimization

Jan Hubicka Tue, 19 Feb 2008 07:52:22 -0800

> I find 'analyze' for the first stage confusing.  We do no analysis 
> there, we just produce summary info.  The analysis is actually done by 
> what you call 'read'.  How about some variant of:
> 
> generate_summary_{function/variable}
> analyze_{function/variable}
> transform_{function/variable}


We seem to have bit confusion here, but I guess it is just terminology
;)

What we want to do is I think clear.  The seqeunce should be:
 1) Look at function body (or variable) and produce some summary
 2) Do optional serialization of summary to disk and read back
 3) Perform interprocedural propagation based on knowledge of full
 callgraph and all the summaries
 4) Do whatever we concluded in 3) on the function body at a time it is
 being compiled.

So for terminology I tend to use:
1) is what I call analyze, since we do look at function and analyze its
local properties. I have no problem to call it generate_summary
3) is execute, since I want to use existing "execute" hook of
passmanager.  It is still having the meaning "do the real work of the
pass" so execute seems to match.
4) is called modify.

I have certainly no problem calling 1) generate_summary and 4)
transform hook.

But the "read" hook is really just intended to convert whatever is
on-disk format to in memory representation, so I don't see why it should
be called analyze_function/variable.

I see that the "execute" stage can be called "analyze", since we do the
IPA analysis to decide what we want to optimize, but then it would be
just "analyze" hook (without the function/variable variants) that walks
the callgraph and varpool himself, instead of being called on each
function/variable in isolation.
> 
> Note that besides these hooks we will also need the central driver for 
> whole-program analysis.  My thinking is that this driver will be part of 

Currently the job to drive compilation process is implemented in
cgraphunit and passmanager. I am leaning to plan to do as much work as
possible at passmanager side with cgraphunit and rest_of_compilation
being from large portion replaced by few extra passes added to queue.

I think this scheme scales to the planned IPA optimizer too: all we need
is to teach passmanger into the new hooks and reorganize the queue a
bit.  As we have now is:

 1) all_lowering_passes queue used by cgraphunit when constructing
 cgraph. I think it can stay this way
 2) all_ipa_passes driving all our IPA optimization
 3) all_passes together with some code in rest_of_compilation driving
 the local optimization.

I am slowly working towards making all_passes part of the ipa passes, so
cgraphunit will need to worry only about the initial analysis of
copmilation unit.

With IPA, I would propose adding the all_interunit_ipa_passes that will
be point where compilation will start with LTO frontend and end with
LTO compilation. 
> 
> Hmm, well.  This could even be on two or three separate compilation 
> passes.  The first pass calls all the 'generate' hooks (this can be done 
> via make -jN with all the initial .c files), a second pass calls all the 
> analysis hooks (this is done by a single GCC invocation) and the third 
> pass (also done via make -jN) calls all the modify hooks.
> 
> We could structure things so that:
> 
> $ gcc -flto -O2 *.c
> 
> does everything in one invocation.  But I would also like to support the 
> model where we operate in separate phases.

Yes, in order to be able to do "execute" (or analyze in your names)
hooks once and perform modify based on the results in parallel in other
compilation projects, we need to be able to write the optimization
decisions into summaries on disk.  In my terminology this is "function
summary" and "optimization summary".

The function summary is produced early via generate_summary hook and is
placed in cgraph->local field, while "optimization summary" is result of
"execute" hook and is placed in cgraph->global field or realized by
changing the callgraph itself (ie producing new clones, function and
such).

I think we will need another pair of read/write methods to serialize
optimization summary on disk either to the newly shipped .o files or to common
optimization decision file.

But I think we want to implement this incrementally: first do the model
optimizing everything in one linktime process (keeping in mind that we
will want to do more) and then implementing the distribution perhaps
based on Kenny's idea of duplicating all the analysis work in all nodes
or via shipping the newly built .o files.
> 
> 
> >So the plan is to turn IPCP and other passes from doing real cloning into
> >same virtual cloning.
> 
> Sounds good.
> 
> >Thats about it.  I would welcome all comments and if it seems useful I can
> >turn it into wiki page adding details to the current implementation plan 
> >at wiki.
> 
> Thanks for the detailed plan.  Yes, please add it to the whopr wiki. 
> The only aspects that are not too clear to me are what exactly do you 
> plan to do in mainline.
> 
> One idea would be to do all the basic framework during stage 1 and leave 
> it in mainline.   I would suggest doing as much as possible in mainline, 
> so that it's then pulled in by the LTO branch.

Yes, this is what I am leaning to.  Doing as much work as feasible at
stage1 now is IMO good to avoid too many hidden dependencies and get the
basic API changes more noticed and reviewed.

The immediate plan would be (in random order) mostly on mainline:

 1) Continue with cleanups of cgraphunit/passmanager queue: We don't
 want reload in cgraphunit.c :)
 2) enfore split into cgraph->local/cgraph->global fields for existing
 passes
 3) Add the new hooks into passmanager structure and drop RTL -d letters
 leaving new hooks unused
 4) Teach passmanager to call the new hooks in proper order, spit out
 the all_interunit_ipa_passes containing only inliner and ipcp for
 start and restructure them to this scheme (inliner is organized this
 way internally, ipcp will need little work)
    At this point we will be able to implement read/write pairs on LTO
    branch and hopefully get this running.  I am sure we will have a lot
    of fun getting this all running well.
> 
> Kenny, what do you expect we could pull out from the LTO branch for 
> stage 1?  Does it make sense to open a new branch inheriting from LTO 
> for this work?

I would hope to do the work on mainline and LTO itself rather than on
subbranch basically because LTO can't fly without this and vice versa.
Of course if we won't be done at the end of stage1 with the plan above,
we can branch, but I hope it is not that tricky.

Thanks for comments!
Honza
> 
> 
> Thanks.  Diego.

Re: API for callgraph and IPA passes for whole program optimization

Reply via email to