On Sun, 2013-06-16 at 11:18 +0200, Basile Starynkevitch wrote:
> On Fri, Jun 14, 2013 at 11:21:06PM -0400, David Malcolm wrote:
> > I'm hoping that gcc 4.9 can support multiple "parallel universes" of gcc
> > state within one process, to ultimately support gcc usage as a library,
> > where multiple client threads can each have their own gcc context (or
> > "universe").
> > 
> > One issue with the above is the garbage collector.
> > 
> > I think there are two possible ways in which "universe instances" could
> > interact with the GC:
> > 
> > (a) have the universe instances be GC-managed: all parallel universes
> > share the same heap, requiring a rewrite of the GC code to be
> > thread-safe,
> > 
> > or
> > 
> > (b) have the "universe instance"/context manage GC, so that the state of
> > GC is per-universe: each universe has its own GC heap, entirely
> > independent of each other universe's GC heap.  You can't share GC
> > pointers between universes.
> > 
> > I don't think (a) is feasible.
> 
> 
> I agree, but what would be the purpose to run many threads of GCC in parallel 
> which don't share anything?

I'm thinking of the "embedding GCC as a shared library" case.

Consider a web browser, where each tab or window can have multiple
threads, say, a thread to render HTML, a thread to run JavaScript, etc.
The JavaScript code is to be compiled to machine code to get maximum
speed.  How is each tab to do this?   The author of the web browser
needs to be able to embed a compiler into the process, where each thread
might want to do some compiling, each independently of the other
threads.   The compilation will involve some optimization passes - some
of them the ones already implemented in GCC, but maybe some extra ones
that are specific to the JavaScript implementation.

> At the very least, I would expect predefined global trees to be common to all 
> of them. 
> I'm thinking at least of The global_trees array.

In theory these could be shared between threads, but to me it feels like
a premature optimization - once you start sharing GC-managed objects
between threads, you have to somehow ensure that the threads don't stomp
on each other's data (what happens if two threads try to collect at the
same time?  how atomic are the "mark" operations? etc etc).

Also: the global_trees array is affected by options: char_type_node and
double_type_node are affected by flags (for signedness and precision
respectively).  Some threads in a process might want one set of flags,
and some another.

It seems much simpler to me to declare that every compilation context is
its own island, waste a little RAM on having separate copies of things,
and avoid having interactions between garbage-collectors running in
different threads.


> And don't forget plugins, which can (and do, for MELT) run the Ggc collector, 
> and use the 
> PLUGIN_GGC_START, PLUGIN_GGC_MARKING, PLUGIN_GGC_END

Good point.  If we have multiple contexts (or "universes"), then when
GCC calls into a plugin, the plugin could somehow be told which
context/universe is calling it.  We could do this either by adding an
extra argument to the callback function, or by having a thread-local
variable containing a (context*).   We're probably going to need the
latter for other reasons, so perhaps that's the way to go.  It has the
nice property that nothing changes for plugins for the classic "GCC as a
suite of binaries" case.

> I do think that making (in the long term) GCC usable as a library (like LLVM 
> is) is a 
> worthwhile goal, but I don't think that aiming that future library to be 
> multi-threadable
> (or thread-friendly) is very realistic. At least, we should make the two 
> goals separate:
> first, make GCC a library, then (and later) make that library thread friendly.

There are various issues with GCC as a library:

* the simple mechanics of building with -fPIC/-fpic, and generating .so
files rather than .a files.  The former gives a performance hit, so we'd
also want a configure-time switch to enable it, so that the classic "GCC
as a suite of mononlithic binaries" use-case doesn't get slower.

* having a stable API that people can write to.

* how does someone embed the code in their program in a way that's sane?
They can't just call toplev_main.

* global state: I've been looking at state within GCC, and there's a lot
of "global state" i.e. state that persists during the lifetime of the
process.  If GCC is to be usable as a library, I think we need to fix
that state, otherwise you can't sanely run the compiler more than once
within the lifetime of one process.  Thread-safety is a cousin to this
problem: I think that if we fix global state, the fix for thread-safety
is also doable.

FWIW, as mentioned in another reply on this thread, I've been writing up
some notes that I hope can form a plan for removing global state from
GCC; I hope to post it soon (though it's gotten *very* long).

> 
> 
> So I might not be very happy of your patch ....
> 
> Regards.


Reply via email to