O, Sat, Dec 09, 2006 at 11:03:21AM -0500, Daniel Berlin écrivait/wrote: > On 12/9/06, Basile STARYNKEVITCH <[EMAIL PROTECTED]> wrote: > >Le Fri, Dec 08, 2006 at 07:09:23PM -0500, Daniel Berlin écrivait/wrote: > > > >> You see, we currently waste a lot of memory to avoid the fact that our > >> GC is very slow. > >> We still take it on the chin when it comes to locality. Previous > >> things such as moving basic blocks from alloc_pools (which are > >> contiguous) to gc'd space cost us 2-3% compilation time alone, because > >> of how bad our GC places objects. > >
I wrote the paragraph below, but it was poorly phrased and I apologize for having mis-expressed my thoughts. > >Even 25% of current GCC compilation time is a noise level to me. If I > >achieve 1000% of current GCC -O3 compilation time, I will be very > >proud of me. So I really do not care about 3%, and I thought that my > >proposal won't cost a lot if it is not used (because if they are no > >finalized object, GCC won't run much slower...). The 3%, 25% or 1000% figures (only a guess) above are only for global static analysis passes. They are not for the usual GCC passes used by everyone with -O2 (or even -O3). I should have emphasized that I do not care in 3% for the overall cost of future expensive static analysis passes (which the project GlobalGCC is about), not for the usual performance of the compiler (for which like every one I care about). Of course 3% is meaningful in the usual compiler run (typically -O2) and I am careful to avoid spoiling it. Sorry for having phrased my sentence wrongly.; I repeat myself, the huge overhead I am considering is only for expensive static analysis passes which are usually disabled. Hence I am discussing here some ideas to avoid spoiling the compiler, while still bringing *usually disabled* features which could be useful to global static analysis and perhaps also to other (expensive) passes of GCC. I do care a lot about not increasing compile time when these expensive passes are disabled. My expertise is more about static analysis to produce human readable warnings, much like commercial tools like PolySpace offer. This means analysing small to medium sized programs (typically less than a few hundred thousand lines of source code) whose source is fully available. And my contractual obligations inside GlobalGCC (w.r.t those funding my work) is mostly to deliver some global static analysis passes. > > > >> > That's great. If you want to make a compiler useless to almost > everyone, go for it. Do it on a branch, go wild. I'm sure there are > 6 or 7 people in the world who will use it because it matters to them > that much. First, the GlobalGCC project is about adding global static analysis which of course can and should be usually disabled. So of course I am careful to not spoil the trunk. All the equivalent static analysis tools I know about do run much slower than a compiler, and this is expected and normal by their users. And they are people who still are interested and buying them. So there is a niche market for such a project (otherwise it won't have been funded). For example Airbus (which is inside the GlobalGCC consortium) routinely use static analysis tools for analysis of industrial strength critical code (flying in A330, A350, A380). Also Mandriva is inside the consortium because they wanted a tool which helps them to port all the software in the Mandriva linux distributions to 64 bits hosts (currently this task costs them dozen of man years). Other industrials in the consortium and outside also care about static analysis (but I am not speaking for any of them). The end-result of GlobalGCC should be something like a set of additional passes, which can be disabled both at configuration time and at GCC compile time and which are usually disabled, unless a -fdo-globalgcc-analysis [or call it what you like] is explicitly given to the resulting GCC (which has been configured at build time with something like -enable-globalgcc-analysis). Do you agree a priori that providing a set of passes which are usually disabled should not significantly impact the daily performance of GCC? When the user requests them (thru -fdo-globalgcc-analysis or whatever other flag) and when the GCC has been configured at build time to provide them, additional global static analysis occurs. Otherwise, they are not called at all. Again, a gate in a struct tree_opt_pass which starts with if (!flag_globalgcc_analysis) return 0; should not cost much (much less than 1% or even 0.1% hopefully), in the usual event where flag_globalgcc_analysis is always cleared. When requested, it is expected that we will provide whole program static analysis. This means that we somehow need the -fwhole-program -combine flags to GCC. By the way, this current option -fwhole-program is rarely used : Google Code Search for fwhole-program gives only 20 occurrences, of which only one is outside of GCC (and inside a comment). Hence, apparently this -fwhole-program is not at all used today in open source code! Still, I think that -fwhole-program is useful and should not be removed. In addition, it is a prerequesite of the static analysis I am considering. I am aware of the LinkTimeOptimisation branch which would make it somehow obsolete (or somehow hidden). The expected results of such global static analysis are mostly in two directions: first, a better diagnosis for human programmers using such a tool. Essentially, it works like a sophisticated lint by producing warnings. The challenge is to avoid spurious warnings. This is my personal focus. Strangly, many industrial partners (and also informally outside developers) are interested by such a feature. In a pompuous wording, this is something like a -Wgive-very-costly-warnings option, which in addition to -Wall and other warnings is able to deliver additional, contextual, warnings. If you want a commercial blurb for the usefulness of such tools look at http://www.polyspace.com/ (and since they have clients, it shows that such techniques can be sometimes useful and are used). So there is a niche market for costly diagnosis (static hasard detection). second, an opportunity to optimize even more (but this is more a task for INRIA or other partners, in particular Sebastian Pop or Albert Cohen; I am not an expert at all on these issues). The intuition is that global static analysis permit a very costly (in compile time) optimisation which could be viewed as some -O999 flag (or more precisely -fvery-costly-optimisations) which some users might want. Intuitively, if the compiler happen to know, that in some specific (call) context, a given pointer is never null, it could optimize even more. And likewise for some assert-s, alias analysis, etc... And there is a niche market for such costly optimisations: at Livermore, CEA, NASA, Google .... there exist some software running for months of CPU time, and improving their runtime by a few percents (or perthousands) is worthwhile even if their compilation time is greatly increased. I definitely do not want to give real figures (and my 1000% etc figures are completely fictional, but grossly inspired by previous experience and tools). But I do believe that there is a place for expensive compilation techniques which I admit will rarely be used. (and I also would tell you that I rarely compile myself stuff with -O3, usually -O1 -g is ok for me). > However, you seem to be trying to propose a mechanism for the *mainline of > gcc*. > If you want to get something into the *mainline of gcc*, you need to > be in touch with the concerns that people have about slowing down the > compiler 3%, because that is what our *mainline gcc* customers care > about. I apologize again for having given the wrong impression that I do not care about slowing down the compiler 3%. I definitely do care about this, but I do not care yet much about the speed of static analysis I'm trying to design. I definitely agree with you, and I apologize if I expressed myself wrongly so that you thought that I am not concerned about a 3% loss in the main compiler with its usual settings. Of course I do not want to put such a weight on GCC running with its usual setting (typically -O2 seems to be the most often used option). So I try to design machinery which can be disabled (and which when disabled do not impact much GCC compile time) but which can also be enabled, at the cost of bigger compilation time. I hope pepple on this mailing list will help me to achieve this subgoal. > > >> This just isn't that big a problem. If you want to associate these > >> things with trees, put them in annotations, mark them GTY ((skip)) and > >> explicitly manage their lifetime. It is highly unlikely that they are > >> going to live for some indeterminate amount of time. > > > >So basically you are suggesting me to add some kind of specific > >garbage collection machinery within my pass. Could be ok, but painful. > > > This is what the entire rest of the compiler does. Seriously. Thanks for the tip, but let me elaborate please. > > That's the whole point: *We don't keep things in GC if they have > determinate lifetimes, because our GC is too slow*. Ok I understood your point, and I apologize for having expressed myself wrongly. (English is not native to me, and I am a newbie within the GCC community). However, the static analysis passes I am starting to design are expected to allocate more or less short lived stuff for which a garbage collector seems unavoidable to me. I am not able to code explicit deallocation calls (like ggc_free or xfree) because I do not know precisely when to free data. I do know that it would be continuously allocated and most of it will become garbage inside my passes. This data has a bounded lifetime (since most of it is useless after the static analysis pass) but since the passes are expect to last quite a long time, I need to recover garbage, and I thought that using the ggc_collect routine is best. In other words, I have data which has a bounded live time, but the bound is very big and I need to collect garbage before reaching this bound. > > If you want to implement finalizers on your branch, go for it. You > should just be aware you are going to run into a lot of resistance if > you ever try to submit these patches for mainline, because of speed > issues. Her is my new restricted proposal regarding handling of finalized objects. I just discussed with Sebastian Pop (we are meeting in person nearly every day and he helps me a lot in my understanding of GCC), and he helped me to restrict or reformulate my proposal. My even smaller proposal is: 1. add a GTY((mark_hook("routine_name"))) option to GTY. When this option is not used, the generated gt-*.h files are unchanged from what they are currently, so there is no additional penalty in that very common case. 2. In the very few structures where this mark_hook("routine_name") option is specified (for gengtype) it asks gengtype to generate a marking routine with a a call to the mark hook. To be specific, we have in gcc/varasm.c struct constant_descriptor_tree GTY(()) { rtx rtl; tree value; hashval_t hash; }; Then gengtype generate (in $GCCBUILD/gcc/gt-varasm.h) a routine like void gt_ggc_mx_constant_descriptor_tree (void *x_p) { struct constant_descriptor_tree * const x = (struct constant_descriptor_tree *)x_p; if (ggc_test_and_set_mark (x)) { gt_ggc_m_7rtx_def ((*x).rtl); gt_ggc_m_9tree_node ((*x).value); } } My proposal was that if I changed the code in gcc/varasm.c to struct constant_descriptor_tree GTY((mark_hook("cdt_mark_hook"))) { rtx rtl; tree value; hashval_t hash; }; Then the gengtype generated marking routine would become void gt_ggc_mx_constant_descriptor_tree (void *x_p) { struct constant_descriptor_tree * const x = (struct constant_descriptor_tree *)x_p; if (ggc_test_and_set_mark (x)) { cdt_mark_hook((void*)x); /// added call gt_ggc_m_7rtx_def ((*x).rtl); gt_ggc_m_9tree_node ((*x).value); } } Of course I do not claim at all that adding a mark hook to the actual constant_descriptor_tree is a sensible thing to consider. I am using this structure only for illustrative purposes as an example. The point is that this GTY((mark_hook( .... ))) don't cost anything to data not using it. And for the few (finalized) data requiring such a hook, its only cost is a call in each marking routine for such data only. Then a pass which (like the passes I am considering) want to have some kind of finalization could: before calling ggc_collect, do its own internal marking by clearing all its internal marks or whatever, eg clearing some vector or array... provide such a mark_hook which set its internal mark or add the stuff to the vector (which has been suitably dimensioned to the right size). just after calling ggc_collect, handle appropriately those data which requires special finalization... Daniel Berlin and others, what do you think of this? In my perception my proposal do not cost anything for most passes & data which do not use this mark_hook trick. And it will be useful for the few passes which want some limited kind of finalization (or destruction) of few objects. At last, it seems not to hard to implement. A bonus could be to provide pre- and post- marking hooks into the ggc_collect garbage collector.... These hooks could be simple (single) function pointers (that the ggc_collect routine has to test for non-nullity before calling), or could be a list of hooks to be called. > > This may or may not matter for your project. From my perspective, and > probably the perspective of most people around here, if your code > isn't going to *eventually* (even years down the road) end up in > mainline, it's generally a waste of time and it won't garner community > support (because nobody will use it in production). This is definitely agreed and understood by me (even two years ago, when I was just starting to write the GlobalGCC proposal to get funded). > Research for the > sake of research is great, don't get me wrong, but given the limited > amount of time most GCC developers have to spend, it means we each > pick and choose the projects we work on and try to help contribute to, > and most people contribute to projects that they see being > productionized in some short number of years. That said, it's your > time and money, you are free to do as you wish with it. No, I am not free to do what I wish. The GlobalGCC project (funding my work) has strict constraints which I have to comply with. And it is not much research (and certainly not purely academic research), just implementation within GCC of techniques already *sold* by commercial (expensive) static analysers since about ten years ago, for which a niche market already exists. The academic papers on it started in 1978. Please be nice to comment about adding costly passes which are almost always disabled, and about my mark_hook proposal which should only cost to fewv data/passes using it. And again accept my apologies for my previous ppor wording... Regards. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359 8, rue de la Faïencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} ***