On Tue, May 02, 2006 at 09:07:19PM +0200, Laurynas Biveinis wrote: > Having that in mind, also because I actually like doing > infrastructural projects, cleaning up, speeding up things instead of > implementing new features, I think I have found a suitable area for my > project: garbage collection. I was looking for a compiler speed-up > project yet avoid sinking into tree/RTL internals, as I doubt I would > learn them effectively in time.
Hi Laurynas, I wrote a lot of the current zone collector. Before that, Daniel Berlin did a lot of work on it. I really don't think I have time to mentor an SoC project (Daniel, do you, maybe?), but I'd be glad to talk to you about possible plans. > - Investigate possibility of Boehm GC usage for compiler proper. > Project page mentions inconclusive previous results here. I tried to > google them in mailing list archives to no avail. Could somebody > provide pointers to the previous investigation? I don't know of any offhand (which doesn't mean there isn't any). > - Assuming that Boehm GC turns out to be unusable for the compiler, > finish the zone collector. Again, searching mailing list about what's > unfinished was not very fruitful. What's there works. I don't remember if it is sufficiently portable to e.g. systems without mmap to use as the default; and it needs some performance tuning still, probably, although I did a lot. > - Assuming zone collector is done well before deadline, tune the > collector by creating special zones for data with special lifetime. This turned out not to make a huge amount of difference; the problem is that the lifetimes of things are unclear in GCC and often long. I wrote a collector which could separate out data only referenced by a single function into its own zones; less than 30% of all data in a GCC run is specific to a function, because things like types and constants are not. I think it was even less. I don't have the statistics handy. The copying part of the collector worked well. The partitioning was less useful. My plan was to eventually enable generational GC; i.e. to be able to collect more frequently and assert that data for other functions had not changed since the last collection, since we optimize one function at a time. You could even mprotect the other zones read-only under --enable-checking to verify that we didn't clobber other functions. The problem was that the global zone was so large that collection time was still high. On March 13th 2005 I posted the copying collector to gcc-patches; you can find it in the archives. That was good for a performance boost by itself of a few percent, but used a lot of RAM; something cleverer might be doable that would use less RAM; I don't really know. I don't think I ever posted the automatic partitioning code; that computer is off right now, but I can dig out the code if you want to see it. -- Daniel Jacobowitz CodeSourcery