> I find 'analyze' for the first stage confusing. We do no analysis > there, we just produce summary info. The analysis is actually done by > what you call 'read'. How about some variant of: > > generate_summary_{function/variable} > analyze_{function/variable} > transform_{function/variable}
We seem to have bit confusion here, but I guess it is just terminology ;) What we want to do is I think clear. The seqeunce should be: 1) Look at function body (or variable) and produce some summary 2) Do optional serialization of summary to disk and read back 3) Perform interprocedural propagation based on knowledge of full callgraph and all the summaries 4) Do whatever we concluded in 3) on the function body at a time it is being compiled. So for terminology I tend to use: 1) is what I call analyze, since we do look at function and analyze its local properties. I have no problem to call it generate_summary 3) is execute, since I want to use existing "execute" hook of passmanager. It is still having the meaning "do the real work of the pass" so execute seems to match. 4) is called modify. I have certainly no problem calling 1) generate_summary and 4) transform hook. But the "read" hook is really just intended to convert whatever is on-disk format to in memory representation, so I don't see why it should be called analyze_function/variable. I see that the "execute" stage can be called "analyze", since we do the IPA analysis to decide what we want to optimize, but then it would be just "analyze" hook (without the function/variable variants) that walks the callgraph and varpool himself, instead of being called on each function/variable in isolation. > > Note that besides these hooks we will also need the central driver for > whole-program analysis. My thinking is that this driver will be part of Currently the job to drive compilation process is implemented in cgraphunit and passmanager. I am leaning to plan to do as much work as possible at passmanager side with cgraphunit and rest_of_compilation being from large portion replaced by few extra passes added to queue. I think this scheme scales to the planned IPA optimizer too: all we need is to teach passmanger into the new hooks and reorganize the queue a bit. As we have now is: 1) all_lowering_passes queue used by cgraphunit when constructing cgraph. I think it can stay this way 2) all_ipa_passes driving all our IPA optimization 3) all_passes together with some code in rest_of_compilation driving the local optimization. I am slowly working towards making all_passes part of the ipa passes, so cgraphunit will need to worry only about the initial analysis of copmilation unit. With IPA, I would propose adding the all_interunit_ipa_passes that will be point where compilation will start with LTO frontend and end with LTO compilation. > > Hmm, well. This could even be on two or three separate compilation > passes. The first pass calls all the 'generate' hooks (this can be done > via make -jN with all the initial .c files), a second pass calls all the > analysis hooks (this is done by a single GCC invocation) and the third > pass (also done via make -jN) calls all the modify hooks. > > We could structure things so that: > > $ gcc -flto -O2 *.c > > does everything in one invocation. But I would also like to support the > model where we operate in separate phases. Yes, in order to be able to do "execute" (or analyze in your names) hooks once and perform modify based on the results in parallel in other compilation projects, we need to be able to write the optimization decisions into summaries on disk. In my terminology this is "function summary" and "optimization summary". The function summary is produced early via generate_summary hook and is placed in cgraph->local field, while "optimization summary" is result of "execute" hook and is placed in cgraph->global field or realized by changing the callgraph itself (ie producing new clones, function and such). I think we will need another pair of read/write methods to serialize optimization summary on disk either to the newly shipped .o files or to common optimization decision file. But I think we want to implement this incrementally: first do the model optimizing everything in one linktime process (keeping in mind that we will want to do more) and then implementing the distribution perhaps based on Kenny's idea of duplicating all the analysis work in all nodes or via shipping the newly built .o files. > > > >So the plan is to turn IPCP and other passes from doing real cloning into > >same virtual cloning. > > Sounds good. > > >Thats about it. I would welcome all comments and if it seems useful I can > >turn it into wiki page adding details to the current implementation plan > >at wiki. > > Thanks for the detailed plan. Yes, please add it to the whopr wiki. > The only aspects that are not too clear to me are what exactly do you > plan to do in mainline. > > One idea would be to do all the basic framework during stage 1 and leave > it in mainline. I would suggest doing as much as possible in mainline, > so that it's then pulled in by the LTO branch. Yes, this is what I am leaning to. Doing as much work as feasible at stage1 now is IMO good to avoid too many hidden dependencies and get the basic API changes more noticed and reviewed. The immediate plan would be (in random order) mostly on mainline: 1) Continue with cleanups of cgraphunit/passmanager queue: We don't want reload in cgraphunit.c :) 2) enfore split into cgraph->local/cgraph->global fields for existing passes 3) Add the new hooks into passmanager structure and drop RTL -d letters leaving new hooks unused 4) Teach passmanager to call the new hooks in proper order, spit out the all_interunit_ipa_passes containing only inliner and ipcp for start and restructure them to this scheme (inliner is organized this way internally, ipcp will need little work) At this point we will be able to implement read/write pairs on LTO branch and hopefully get this running. I am sure we will have a lot of fun getting this all running well. > > Kenny, what do you expect we could pull out from the LTO branch for > stage 1? Does it make sense to open a new branch inheriting from LTO > for this work? I would hope to do the work on mainline and LTO itself rather than on subbranch basically because LTO can't fly without this and vice versa. Of course if we won't be done at the end of stage1 with the plan above, we can branch, but I hope it is not that tricky. Thanks for comments! Honza > > > Thanks. Diego.