Jakub, I am adding you to CC since I put my current toughts on LTO and debug info in here. > > Fork-fire-forget is really a much simpler choice here IMO; no worries > > about shared resources, less debug hassle. > > It might be not as cheap as it is on Linux hosts on other hosts of > course. Also I'd rather try to avoid I/O than solving the issue
I still have some items on list here 1) avoid function sections to be decompressed by WPA (this won't cause much compile time improvements as decompression is well bellow 10% of runtime) 2) put variable initializers into named sections just as function bodies are. Seeing Martin's systemtaps of firefox/gimp/inkscape, to my surprise the initializers are actually about as big as the text segment. While it seems bit wasteful to pust single integer_cst there (and we can special case this), it seems that there is a promise for vtables and other stuff. To make devirt work, we will need to load vtables into memory (or invent representation to stream them other way that would be similarly big). Still we will avoid need to load them in 5000 copies and merge them. 3) I think good part of function/partitioning overhead is because abstract origin streaming is utterly broken. Currently we can have DECL_ABSTRACT_ORIGIN on a function. This I can now track by used_as_abstract_origin flag and I can stream those functions into partitins using them. This is still wrong for multitude of reasons 1) we really want DECL_INITIAL tree of the functions used as abstract origins in the form before any gimple optimizations happened on them. (that is when debug hook is called) This is not what happens - we stream the tree as it looks during TLO streaming time - i.e. after early optimizations. I think we may just (at a time calling the debug hook) duplicate DECL_INITIAL same way we duplicate decls for save_function_body and saving it elsewhere. Making this tree to be abstract origin of the offline copy of the function itself. 2) dwarf2out doesn't really the DECL_INITIAL tree so it does something useful only when it is already there. It can simply call cgraph_get_body when it needs the DECL_INITIAL, but it doesn't becuase push_cfun causes ICE. If we really can't push_cfun from middle of RTL queueu, I suppose I can just save it elsewhere 3) It is not only toplevel decl that has origin, but all local vars in the function. I think this goes terribly wrong - these decls are not indexable so they are stored into function section of every function referring to them. They are then read in many duplicates and never merged with the DECL_INITIAL tree of the actual abstract origin. For some reason dwarf2out doesn't seem to ICE, but I also do not see how this can produce working debug. Moreover I think the duplicates contribute to our current debug info size problems with LTO. If we solve 1) as discussed by above (i.e. by having separate block trees for functions that are abstract origins), we can then solve 3) by streaming those into global decl stream and make cross-function_context tree references to become global. 4) Of course after early inlining function may need abstract origins from multiple other functions. I do not track this at all. May be easy to just collect a vector of functions that are needed into cgraph_node. Of course solving 1)-4) is bit of early debug info without actually going to stream the dwarf dies, but by using the BLOCK trees as a temporary representation. Incrementally we can have this saved BLOCK tree to be a dwarf DIE and have origins to point to them instead of decls. To get resonable streaming performance it would be nice to have way to get abstract origin references cross-partition that debug info can accomplish. Said that, I now have the fork() patch in all my trees and enjoy 50% faster WPA times. I changed my mind about claim that stremaing should be disk bound - it is hard to hope for disk boundness for something that should fit in cache. We went down from 5GB to 2GB of streaming for Firefox that is good. But we will see again 4GB once Martin's code layout work will land. I think it is from good part because of the origin fun above. Honza > by parallelizing it. Of course we can always come back to this > kind of hack later. > > Richard.