On Tue, May 02, 2006 at 09:07:19PM +0200, Laurynas Biveinis wrote:
> Having that in mind, also because I actually like doing
> infrastructural projects, cleaning up, speeding up things instead of
> implementing new features, I think I have found a suitable area for my
> project: garbage collection. I was looking for a compiler speed-up
> project yet avoid sinking into tree/RTL internals, as I doubt I would
> learn them effectively in time.

Hi Laurynas,

I wrote a lot of the current zone collector.  Before that, Daniel
Berlin did a lot of work on it.  I really don't think I have time to
mentor an SoC project (Daniel, do you, maybe?), but I'd be glad to talk
to you about possible plans.

> - Investigate possibility of Boehm GC usage for compiler proper.
> Project page mentions inconclusive previous results here. I tried to
> google them in mailing list archives to no avail. Could somebody
> provide pointers to the previous investigation?

I don't know of any offhand (which doesn't mean there isn't any).

> - Assuming that Boehm GC turns out to be unusable for the compiler,
> finish the zone collector. Again, searching mailing list about what's
> unfinished was not very fruitful.

What's there works.  I don't remember if it is sufficiently portable to
e.g. systems without mmap to use as the default; and it needs some
performance tuning still, probably, although I did a lot.

> - Assuming zone collector is done well before deadline, tune the
> collector by creating special zones for data with special lifetime.

This turned out not to make a huge amount of difference; the problem is
that the lifetimes of things are unclear in GCC and often long.  I
wrote a collector which could separate out data only referenced by a
single function into its own zones; less than 30% of all data in a GCC
run is specific to a function, because things like types and constants
are not.  I think it was even less.  I don't have the statistics handy.

The copying part of the collector worked well.  The partitioning was
less useful.

My plan was to eventually enable generational GC; i.e. to be able to
collect more frequently and assert that data for other functions had
not changed since the last collection, since we optimize one function
at a time.  You could even mprotect the other zones read-only under
--enable-checking to verify that we didn't clobber other functions.
The problem was that the global zone was so large that collection time
was still high.

On March 13th 2005 I posted the copying collector to gcc-patches; you
can find it in the archives.  That was good for a performance boost by
itself of a few percent, but used a lot of RAM; something cleverer
might be doable that would use less RAM; I don't really know.  I don't
think I ever posted the automatic partitioning code; that computer is
off right now, but I can dig out the code if you want to see it.

-- 
Daniel Jacobowitz
CodeSourcery

Reply via email to