Re: destruction of GTY() data

Basile STARYNKEVITCH Sat, 09 Dec 2006 12:52:15 -0800

O, Sat, Dec 09, 2006 at 11:03:21AM -0500, Daniel Berlin écrivait/wrote:
> On 12/9/06, Basile STARYNKEVITCH <[EMAIL PROTECTED]> wrote:
> >Le Fri, Dec 08, 2006 at 07:09:23PM -0500, Daniel Berlin écrivait/wrote:
> >
> >> You see, we currently waste a lot of memory to avoid the fact that our
> >> GC is very slow.
> >> We still take it on the chin when it comes to locality.  Previous
> >> things such as moving basic blocks from alloc_pools (which are
> >> contiguous) to gc'd space cost us 2-3% compilation time alone, because
> >> of how bad our GC places objects.
> >


I wrote the paragraph below, but it was poorly phrased and I apologize
for having mis-expressed my thoughts.

> >Even 25% of current GCC compilation time is a noise level to me. If I
> >achieve 1000% of current GCC -O3 compilation time, I will be very
> >proud of me. So I really do not care about 3%, and I thought that my
> >proposal won't cost a lot if it is not used (because if they are no
> >finalized object, GCC won't run much slower...).

The 3%, 25% or 1000% figures (only a guess) above are only for global
static analysis passes. They are not for the usual GCC passes used by
everyone with -O2 (or even -O3).  I should have emphasized that I do
not care in 3% for the overall cost of future expensive static
analysis passes (which the project GlobalGCC is about), not for the
usual performance of the compiler (for which like every one I care
about).  Of course 3% is meaningful in the usual compiler run
(typically -O2) and I am careful to avoid spoiling it. Sorry for
having phrased my sentence wrongly.; I repeat myself, the huge
overhead I am considering is only for expensive static analysis passes
which are usually disabled. Hence I am discussing here some ideas to
avoid spoiling the compiler, while still bringing *usually disabled*
features which could be useful to global static analysis and perhaps
also to other (expensive) passes of GCC. I do care a lot about not
increasing compile time when these expensive passes are disabled.

My expertise is more about static analysis to produce human readable
warnings, much like commercial tools like PolySpace offer. This means
analysing small to medium sized programs (typically less than a few
hundred thousand lines of source code) whose source is fully
available. And my contractual obligations inside GlobalGCC (w.r.t
those funding my work) is mostly to deliver some global static
analysis passes.

> >
> >>
> That's great. If you want to make a compiler useless to almost
> everyone, go for it.  Do it on a branch, go wild.  I'm sure there are
> 6 or 7 people in the world who will use it because it matters to them
> that much.

First, the GlobalGCC project is about adding global static analysis
which of course can and should be usually disabled. So of course I am
careful to not spoil the trunk. All the equivalent static analysis
tools I know about do run much slower than a compiler, and this is
expected and normal by their users. And they are people who still are
interested and buying them. So there is a niche market for such a
project (otherwise it won't have been funded). For example Airbus
(which is inside the GlobalGCC consortium) routinely use static
analysis tools for analysis of industrial strength critical code
(flying in A330, A350, A380). Also Mandriva is inside the consortium
because they wanted a tool which helps them to port all the software
in the Mandriva linux distributions to 64 bits hosts (currently this
task costs them dozen of man years). Other industrials in the
consortium and outside also care about static analysis (but I am not
speaking for any of them).

The end-result of GlobalGCC should be something like a set of
additional passes, which can be disabled both at configuration time
and at GCC compile time and which are usually disabled, unless a
-fdo-globalgcc-analysis [or call it what you like] is explicitly given
to the resulting GCC (which has been configured at build time with
something like -enable-globalgcc-analysis). Do you agree a priori that
providing a set of passes which are usually disabled should not
significantly impact the daily performance of GCC?

When the user requests them (thru -fdo-globalgcc-analysis or whatever
other flag) and when the GCC has been configured at build time to
provide them, additional global static analysis occurs. Otherwise,
they are not called at all. Again, a gate in a struct tree_opt_pass
which starts with

        if (!flag_globalgcc_analysis) return 0; 

should not cost much (much less than 1% or even 0.1% hopefully), in
the usual event where flag_globalgcc_analysis is always cleared.

When requested, it is expected that we will provide whole program
static analysis. This means that we somehow need the -fwhole-program
-combine flags to GCC. By the way, this current option -fwhole-program
is rarely used : Google Code Search for fwhole-program gives only 20
occurrences, of which only one is outside of GCC (and inside a
comment). Hence, apparently this -fwhole-program is not at all used
today in open source code! Still, I think that -fwhole-program is
useful and should not be removed. In addition, it is a prerequesite of
the static analysis I am considering. I am aware of the
LinkTimeOptimisation branch which would make it somehow obsolete (or
somehow hidden).

The expected results of such global static analysis are mostly in two
directions:

   first, a better diagnosis for human programmers using such a
   tool. Essentially, it works like a sophisticated lint by producing
   warnings. The challenge is to avoid spurious warnings. This is my
   personal focus. Strangly, many industrial partners (and also
   informally outside developers) are interested by such a feature. In
   a pompuous wording, this is something like a
   -Wgive-very-costly-warnings option, which in addition to -Wall and
   other warnings is able to deliver additional, contextual,
   warnings. If you want a commercial blurb for the usefulness of such
   tools look at http://www.polyspace.com/ (and since they have
   clients, it shows that such techniques can be sometimes useful and
   are used). So there is a niche market for costly diagnosis
   (static hasard detection).

   second, an opportunity to optimize even more (but this is more a
   task for INRIA or other partners, in particular Sebastian Pop or
   Albert Cohen; I am not an expert at all on these issues). The
   intuition is that global static analysis permit a very costly (in
   compile time) optimisation which could be viewed as some -O999 flag
   (or more precisely -fvery-costly-optimisations) which some users
   might want. Intuitively, if the compiler happen to know, that in some
   specific (call) context, a given pointer is never null, it could
   optimize even more. And likewise for some assert-s, alias analysis,
   etc... And there is a niche market for such costly optimisations:
   at Livermore, CEA, NASA, Google .... there exist some software
   running for months of CPU time, and improving their runtime by a
   few percents (or perthousands) is worthwhile even if their
   compilation time is greatly increased.

I definitely do not want to give real figures (and my 1000% etc
figures are completely fictional, but grossly inspired by previous
experience and tools). But I do believe that there is a place for
expensive compilation techniques which I admit will rarely be
used. (and I also would tell you that I rarely compile myself stuff
with -O3, usually -O1 -g is ok for me).

> However, you seem to be trying to propose a mechanism for the *mainline of 
> gcc*.
> If you want to get something into the *mainline of gcc*, you need to
> be in touch with the concerns that people have about slowing down the
> compiler 3%, because that is what our *mainline gcc* customers care
> about.

I apologize again for having given the wrong impression that I do not
care about slowing down the compiler 3%. I definitely do care about
this, but I do not care yet much about the speed of static analysis
I'm trying to design.

I definitely agree with you, and I apologize if I expressed myself
wrongly so that you thought that I am not concerned about a 3% loss in
the main compiler with its usual settings.  Of course I do not want to
put such a weight on GCC running with its usual setting (typically -O2
seems to be the most often used option). So I try to design machinery
which can be disabled (and which when disabled do not impact much GCC
compile time) but which can also be enabled, at the cost of bigger
compilation time. I hope pepple on this mailing list will help me to
achieve this subgoal.

> 
> >> This just isn't that big a problem.  If you want to associate these
> >> things with trees, put them in annotations, mark them GTY ((skip)) and
> >> explicitly manage their lifetime.  It is highly unlikely that they are
> >> going to live for some indeterminate amount of time.
> >
> >So basically you are suggesting me to add some kind of specific
> >garbage collection machinery within my pass. Could be ok, but painful.
> >
> This is what the entire rest of the compiler does. Seriously.

Thanks for the tip, but let me elaborate please.
> 
> That's the whole point: *We don't keep things in GC if they have
> determinate lifetimes, because our GC is too slow*.

Ok I understood your point, and I apologize for having expressed
myself wrongly. (English is not native to me, and I am a newbie within
the GCC community).

However, the static analysis passes I am starting to design are
expected to allocate more or less short lived stuff for which a
garbage collector seems unavoidable to me. I am not able to code
explicit deallocation calls (like ggc_free or xfree) because I do not
know precisely when to free data. I do know that it would be
continuously allocated and most of it will become garbage inside my
passes. This data has a bounded lifetime (since most of it is useless
after the static analysis pass) but since the passes are expect to
last quite a long time, I need to recover garbage, and I thought that
using the ggc_collect routine is best. In other words, I have data
which has a bounded live time, but the bound is very big and I need to
collect garbage before reaching this bound.

> 
> If you want to implement finalizers on your branch, go for it.  You
> should just be aware you are going to run into a lot of resistance if
> you ever try to submit these patches for mainline, because of speed
> issues.




Her is my new restricted proposal regarding handling of finalized
objects.

I just discussed with Sebastian Pop (we are meeting in person nearly
every day and he helps me a lot in my understanding of GCC), and he
helped me to restrict or reformulate my proposal. My even smaller
proposal is:

  1. add a GTY((mark_hook("routine_name"))) option to GTY. When this
   option is not used, the generated gt-*.h files are unchanged from
   what they are currently, so there is no additional penalty in that
   very common case.

  2. In the very few structures where this mark_hook("routine_name")
  option is specified (for gengtype) it asks gengtype to generate a
  marking routine with a a call to the mark hook. To be specific, 
  we have in gcc/varasm.c
        struct constant_descriptor_tree GTY(())
        {
          rtx rtl;
          tree value;
          hashval_t hash;
        };

  Then gengtype generate (in $GCCBUILD/gcc/gt-varasm.h) a routine like 

 void
 gt_ggc_mx_constant_descriptor_tree (void *x_p)
 {
   struct constant_descriptor_tree * const x = (struct constant_descriptor_tree 
*)x_p;
   if (ggc_test_and_set_mark (x))
     {
       gt_ggc_m_7rtx_def ((*x).rtl);
       gt_ggc_m_9tree_node ((*x).value);
     }
 }


My proposal was that if I changed the code in gcc/varasm.c to
        struct constant_descriptor_tree GTY((mark_hook("cdt_mark_hook")))
        {
          rtx rtl;
          tree value;
          hashval_t hash;
        };

Then the gengtype generated marking routine would become
 void
 gt_ggc_mx_constant_descriptor_tree (void *x_p)
 {
   struct constant_descriptor_tree * const x = (struct constant_descriptor_tree 
*)x_p;
   if (ggc_test_and_set_mark (x))
     {
       cdt_mark_hook((void*)x);               /// added call
       gt_ggc_m_7rtx_def ((*x).rtl);
       gt_ggc_m_9tree_node ((*x).value);
     }
 }

Of course I do not claim at all that adding a mark hook to the actual
constant_descriptor_tree is a sensible thing to consider. I am using
this structure only for illustrative purposes as an example. 

The point is that this GTY((mark_hook( .... ))) don't cost anything to
data not using it. And for the few (finalized) data requiring such a
hook, its only cost is a call in each marking routine for such data
only.

Then a pass which (like the passes I am considering) want to have some
kind of finalization could:

before calling ggc_collect, do its own internal marking by clearing
all its internal marks or whatever, eg clearing some vector or array...

provide such a mark_hook which set its internal mark or add the stuff
to the vector (which has been suitably dimensioned to the right size).

just after calling ggc_collect, handle appropriately those data which
requires special finalization...

Daniel Berlin and others, what do you think of this? In my perception
my proposal do not cost anything for most passes & data which do not
use this mark_hook trick. And it will be useful for the few passes
which want some limited kind of finalization (or destruction) of few
objects. At last, it seems not to hard to implement.

A bonus could be to provide pre- and post- marking hooks into the
ggc_collect garbage collector.... These hooks could be simple (single)
function pointers (that the ggc_collect routine has to test for
non-nullity before calling), or could be a list of hooks to be called.

> 
> This may or may not matter for your project.  From my perspective, and
> probably the perspective of most people around here, if your code
> isn't going to *eventually*  (even years down the road) end up in
> mainline, it's generally a waste of time and it won't garner community
> support (because nobody will use it in production). 

This is definitely agreed and understood by me (even two years ago,
when I was just starting to write the GlobalGCC proposal to get
funded).

> Research for the
> sake of research is great, don't get me wrong, but  given the limited
> amount of time most GCC developers have to spend, it means we each
> pick and choose the projects we work on and try to help contribute to,
> and most people contribute to projects that they see being
> productionized in some short number of years.  That said, it's your
> time and money, you are free to do as you wish with it.

No, I am not free to do what I wish. The GlobalGCC project (funding my
work) has strict constraints which I have to comply with. And it is
not much research (and certainly not purely academic research), just
implementation within GCC of techniques already *sold* by commercial
(expensive) static analysers since about ten years ago, for which a
niche market already exists. The academic papers on it started in
1978.


Please be nice to comment about adding costly passes which are almost
always disabled, and about my mark_hook proposal which should only
cost to fewv data/passes using it.

And again accept my apologies for my previous ppor wording...

Regards.
-- 
Basile STARYNKEVITCH         http://starynkevitch.net/Basile/ 
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359 
8, rue de la Faïencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***

Re: destruction of GTY() data

Reply via email to