On Fri, 2012-03-16 at 16:17 +0100, Ludovic Courtès wrote:
> Hello,
> 
> Richard Guenther <richard.guent...@gmail.com> skribis:
> 
> > 2012/3/16 Ludovic Courtès <ludovic.cour...@inria.fr>:
> 
> [...]
> 
> > Well, if you invent new paradigms
> 
> Hmm, I didn’t invent anything here.
> 
> > in your plugin that are not used by GCC itself but use GCC internals
> > (which all plugins have to do ...) then I know where the problem lies
> > ;) I suppose you would be better served by some of the more high-level
> > plugin packages that let you write GCC plugins in python or lisp?
> 
> Sure, even Guile-GCC [0].  :-)
> 
> (Speaking of which, since it mostly uses Guile’s dynamic FFI to
> interface with GCC, and thus dlsyms GCC’s symbols, it will also suffer
> from the transition.  How does the Python plug-in handle that?  Is it
> written in ANSI C, or does it use ctypes?)

The python plugin is written in plain C, much of it autogenerated.  It
mostly dynamically links to GCC, but uses weak references on some of
GCC's symbols (e.g. when using symbols that only exist in the C++
frontend), so we'll see how well that works.

I have to admit to being nervous about the GCC change to C++.
Elaborating on this further may degenerate into a rant, so here goes,
and I apologize in advance.

I should stress at the this point that I don't speak for Red Hat here;
this is just my own opinion (based on having spent much of the last year
working on the gcc python plugin, and having spent many years involved
in free software).

The C++ move is making things more difficult for plugin authors, but
that's a symptom of not having a plugin API.

It's perfectly possible to write clean, maintainable code in C that
implements object-oriented ideas such as information hiding, dynamic
dispatch according to type, and so on.  See e.g. CPython or GObject for
examples.

What I'm really hoping for from GCC is a move towards a collection of
libraries that can be embedded in (license-compatible) apps: LLVM is
gaining ground for the use case of programs that need JIT-compilation
(e.g. the X server, or a JVM).  I appreciate that JIT compilation has
different characteristics to a classic ahead-of-time compiler.

C++ is likely to make it more difficult to embed GCC into such apps
(e.g. nasty ordering issues for globals with non-empty constructors; ABI
differences e.g. relating to exception handling; extra complexity of
linking); everyone large free software that uses C++ chooses a subset of
the language to use (this could be enforced with a plugin to the
compiler!)

What use cases would a rearchitected GCC have?  I may not win friends
here by posting this link, but, to be frank, see:
  http://llvm.org/ProjectsWithLLVM/
Some examples:
  * the X server: llvmpipe is a software implementation of OpenGL:
compiling OpenGL shader programs into machine code for the CPU for the
case where the GPU isn't up to the job (or for which the vendor isn't
providing specs).
  * IcedTea is using LLVM to add JIT compilation for Java code (see
http://icedtea.classpath.org/wiki/ZeroSharkFaq#Why_was_Shark_written.3F
)
  * JIT compilation within dynamic languages: e.g. Unladen Swallow
(though that's a dead project; most of the time and energy was spent
fixing bugs in LLVM)
  * static analysis tools
  * anywhere we're writing plugins.  Building standalone tools and just
linking is often preferable.  See e.g.:
http://blogs.gnome.org/jessevdk/2011/09/10/gedit-clang-plugin-progress/
where (if I'm reading it right) the gedit editor has gained a plugin
that embeds parts of LLVM.


Proposed outcome
----------------
GCC becomes a family of shared libraries that can be dynamically and
statically linked.  The executables become a thin wrapper that invoke
these libraries (perhaps statically linked to minimize startup costs).

Note that I'm *not* proposing a license change.  The code that uses
these libraries would need to be license-compatible with GCC.

The ideal outcome would be to dynamically link this code into a
multi-threaded process, in which multiple dynamically-linked libraries
within the process are linked against the GCC library, working
independently, without knowing that each one is using the library.

Such an reorganization could be worthy of bump to the major-release id
(e.g. "gcc 5")

Current architectural issues
-----------------------------
GCC is currently a large body of single-threaded code that is compiled
and linked into multiple large executables. 

Issues:
* lots of global state: global variables, static locals, etc.
* related to the above: function often have side-effects
* memory management: custom allocator, with garbage collection.  I'm not
sure how the GC code locates roots on the stack, and I suspect that's
there's currently an assumption there there aren't multiple threads
within the process
* process termination: code can call abort() to handle errors, rather
than being able to report back an error state.  It may be acceptable for
a standalone executable to call "abort", but it's not acceptable when a
library that you're calling into does (e.g. if the X server is using
such a library, your desktop goes away...)
* actual machine-code generation is currently done by the GNU assembler:
would need to turn that into a shared library as well (turning GNU
assembler into a thin wrapper around that library)
* no namespacing: seemingly arbitrary naming convention for symbols.
Would want to add some namespace prefix to the public symbols (vars and
fns), and to the types.
* JIT compilation vs AOT compilation: when an app embeds a JIT it would
want to assemble a more appropriate set of compilation passes
dynamically, I suspect, compared to the full set of AOT passes that GCC
currently has.
* plugin architecture: how would plugins relate to this rearchitected
GCC?
* option-handling is currently very tied to the command line.  Multiple
uses of the library might want different option sets.
* threading?  potentially one could punt this by adding a "Big Compiler
Lock": one big mutex guarding all usage of the APIs.
* the political will to do this
* actually implementing the thing!

It seems that GCC has provided an API for registering plugins, but no
API for the plugins to then actually use...  Perhaps the C++ move would
be alleviated by having an actually C API for plugins to use?  I started
writing a possible API for plugins, the idea being to port my python
plugin to this as a middle layer, but it strikes me that this could also
be used for the embedding case as well.  So perhaps the first step might
be to implement the plugin API, and that could evolve into the
inter-library API that the different parts of a more modular GCC could
use to talk to each other?

The proposed API might look like this:

/*
  Pure C for maximum compatibility
  All macros begin with a "GCC_" prefix
  All symbols begin with a "gcc_" prefix with _ separators, though I
  happen to prefer the CPython style (e.g. "GccBasicBlock_GetIndex");
  bikeshed away!
  (You may only call such a symbol when you have the Big GCC Lock?)
  All types begin with a "gcc_" prefix (again, I'd prefer CPython style
  e.g. "struct GccBasicBlock").
  How acceptable is it to autogenerate parts of the API? (this is what
  I do in my python plugin; naturally I use python for this).
*/
        
/* Compatibility macros: */
#define GCC_API(RETURN_TYPE) extern RETURN_TYPE

/* All types are opaque; internally these might simply embed one of
gcc's real types as its single field; integration with GC could be
interesting though: */
typedef struct gcc_cfg gcc_cfg;
typedef struct gcc_basic_block gcc_basic_block;
typedef struct gcc_edge gcc_edge;

/* Declarations: control flow graphs */

/* gcc_cfg: */
GCC_API(gcc_basic_block *)
gcc_cfg_get_entry(gcc_cfg *cfg);

GCC_API(gcc_basic_block *)
gcc_cfg_get_exit(gcc_cfg *cfg);

/* gcc_basic_block: */
GCC_API(int)
gcc_basic_block_get_index(const gcc_basic_block *bb);

/* gcc_edge: */
GCC_API(gcc_basic_block *)
gcc_edge_get_src(gcc_edge *e);

GCC_API(gcc_basic_block *)
gcc_edge_get_dest(gcc_edge *e);

/* ...etc... */


Again, these opinions are my own, and not those of Red Hat.

Hope this is constructive.
Dave

Reply via email to