On Sun, 2013-06-16 at 11:18 +0200, Basile Starynkevitch wrote: > On Fri, Jun 14, 2013 at 11:21:06PM -0400, David Malcolm wrote: > > I'm hoping that gcc 4.9 can support multiple "parallel universes" of gcc > > state within one process, to ultimately support gcc usage as a library, > > where multiple client threads can each have their own gcc context (or > > "universe"). > > > > One issue with the above is the garbage collector. > > > > I think there are two possible ways in which "universe instances" could > > interact with the GC: > > > > (a) have the universe instances be GC-managed: all parallel universes > > share the same heap, requiring a rewrite of the GC code to be > > thread-safe, > > > > or > > > > (b) have the "universe instance"/context manage GC, so that the state of > > GC is per-universe: each universe has its own GC heap, entirely > > independent of each other universe's GC heap. You can't share GC > > pointers between universes. > > > > I don't think (a) is feasible. > > > I agree, but what would be the purpose to run many threads of GCC in parallel > which don't share anything?
I'm thinking of the "embedding GCC as a shared library" case. Consider a web browser, where each tab or window can have multiple threads, say, a thread to render HTML, a thread to run JavaScript, etc. The JavaScript code is to be compiled to machine code to get maximum speed. How is each tab to do this? The author of the web browser needs to be able to embed a compiler into the process, where each thread might want to do some compiling, each independently of the other threads. The compilation will involve some optimization passes - some of them the ones already implemented in GCC, but maybe some extra ones that are specific to the JavaScript implementation. > At the very least, I would expect predefined global trees to be common to all > of them. > I'm thinking at least of The global_trees array. In theory these could be shared between threads, but to me it feels like a premature optimization - once you start sharing GC-managed objects between threads, you have to somehow ensure that the threads don't stomp on each other's data (what happens if two threads try to collect at the same time? how atomic are the "mark" operations? etc etc). Also: the global_trees array is affected by options: char_type_node and double_type_node are affected by flags (for signedness and precision respectively). Some threads in a process might want one set of flags, and some another. It seems much simpler to me to declare that every compilation context is its own island, waste a little RAM on having separate copies of things, and avoid having interactions between garbage-collectors running in different threads. > And don't forget plugins, which can (and do, for MELT) run the Ggc collector, > and use the > PLUGIN_GGC_START, PLUGIN_GGC_MARKING, PLUGIN_GGC_END Good point. If we have multiple contexts (or "universes"), then when GCC calls into a plugin, the plugin could somehow be told which context/universe is calling it. We could do this either by adding an extra argument to the callback function, or by having a thread-local variable containing a (context*). We're probably going to need the latter for other reasons, so perhaps that's the way to go. It has the nice property that nothing changes for plugins for the classic "GCC as a suite of binaries" case. > I do think that making (in the long term) GCC usable as a library (like LLVM > is) is a > worthwhile goal, but I don't think that aiming that future library to be > multi-threadable > (or thread-friendly) is very realistic. At least, we should make the two > goals separate: > first, make GCC a library, then (and later) make that library thread friendly. There are various issues with GCC as a library: * the simple mechanics of building with -fPIC/-fpic, and generating .so files rather than .a files. The former gives a performance hit, so we'd also want a configure-time switch to enable it, so that the classic "GCC as a suite of mononlithic binaries" use-case doesn't get slower. * having a stable API that people can write to. * how does someone embed the code in their program in a way that's sane? They can't just call toplev_main. * global state: I've been looking at state within GCC, and there's a lot of "global state" i.e. state that persists during the lifetime of the process. If GCC is to be usable as a library, I think we need to fix that state, otherwise you can't sanely run the compiler more than once within the lifetime of one process. Thread-safety is a cousin to this problem: I think that if we fix global state, the fix for thread-safety is also doable. FWIW, as mentioned in another reply on this thread, I've been writing up some notes that I hope can form a plan for removing global state from GCC; I hope to post it soon (though it's gotten *very* long). > > > So I might not be very happy of your patch .... > > Regards.