On Fri, 2012-03-16 at 16:17 +0100, Ludovic Courtès wrote: > Hello, > > Richard Guenther <richard.guent...@gmail.com> skribis: > > > 2012/3/16 Ludovic Courtès <ludovic.cour...@inria.fr>: > > [...] > > > Well, if you invent new paradigms > > Hmm, I didn’t invent anything here. > > > in your plugin that are not used by GCC itself but use GCC internals > > (which all plugins have to do ...) then I know where the problem lies > > ;) I suppose you would be better served by some of the more high-level > > plugin packages that let you write GCC plugins in python or lisp? > > Sure, even Guile-GCC [0]. :-) > > (Speaking of which, since it mostly uses Guile’s dynamic FFI to > interface with GCC, and thus dlsyms GCC’s symbols, it will also suffer > from the transition. How does the Python plug-in handle that? Is it > written in ANSI C, or does it use ctypes?)
The python plugin is written in plain C, much of it autogenerated. It mostly dynamically links to GCC, but uses weak references on some of GCC's symbols (e.g. when using symbols that only exist in the C++ frontend), so we'll see how well that works. I have to admit to being nervous about the GCC change to C++. Elaborating on this further may degenerate into a rant, so here goes, and I apologize in advance. I should stress at the this point that I don't speak for Red Hat here; this is just my own opinion (based on having spent much of the last year working on the gcc python plugin, and having spent many years involved in free software). The C++ move is making things more difficult for plugin authors, but that's a symptom of not having a plugin API. It's perfectly possible to write clean, maintainable code in C that implements object-oriented ideas such as information hiding, dynamic dispatch according to type, and so on. See e.g. CPython or GObject for examples. What I'm really hoping for from GCC is a move towards a collection of libraries that can be embedded in (license-compatible) apps: LLVM is gaining ground for the use case of programs that need JIT-compilation (e.g. the X server, or a JVM). I appreciate that JIT compilation has different characteristics to a classic ahead-of-time compiler. C++ is likely to make it more difficult to embed GCC into such apps (e.g. nasty ordering issues for globals with non-empty constructors; ABI differences e.g. relating to exception handling; extra complexity of linking); everyone large free software that uses C++ chooses a subset of the language to use (this could be enforced with a plugin to the compiler!) What use cases would a rearchitected GCC have? I may not win friends here by posting this link, but, to be frank, see: http://llvm.org/ProjectsWithLLVM/ Some examples: * the X server: llvmpipe is a software implementation of OpenGL: compiling OpenGL shader programs into machine code for the CPU for the case where the GPU isn't up to the job (or for which the vendor isn't providing specs). * IcedTea is using LLVM to add JIT compilation for Java code (see http://icedtea.classpath.org/wiki/ZeroSharkFaq#Why_was_Shark_written.3F ) * JIT compilation within dynamic languages: e.g. Unladen Swallow (though that's a dead project; most of the time and energy was spent fixing bugs in LLVM) * static analysis tools * anywhere we're writing plugins. Building standalone tools and just linking is often preferable. See e.g.: http://blogs.gnome.org/jessevdk/2011/09/10/gedit-clang-plugin-progress/ where (if I'm reading it right) the gedit editor has gained a plugin that embeds parts of LLVM. Proposed outcome ---------------- GCC becomes a family of shared libraries that can be dynamically and statically linked. The executables become a thin wrapper that invoke these libraries (perhaps statically linked to minimize startup costs). Note that I'm *not* proposing a license change. The code that uses these libraries would need to be license-compatible with GCC. The ideal outcome would be to dynamically link this code into a multi-threaded process, in which multiple dynamically-linked libraries within the process are linked against the GCC library, working independently, without knowing that each one is using the library. Such an reorganization could be worthy of bump to the major-release id (e.g. "gcc 5") Current architectural issues ----------------------------- GCC is currently a large body of single-threaded code that is compiled and linked into multiple large executables. Issues: * lots of global state: global variables, static locals, etc. * related to the above: function often have side-effects * memory management: custom allocator, with garbage collection. I'm not sure how the GC code locates roots on the stack, and I suspect that's there's currently an assumption there there aren't multiple threads within the process * process termination: code can call abort() to handle errors, rather than being able to report back an error state. It may be acceptable for a standalone executable to call "abort", but it's not acceptable when a library that you're calling into does (e.g. if the X server is using such a library, your desktop goes away...) * actual machine-code generation is currently done by the GNU assembler: would need to turn that into a shared library as well (turning GNU assembler into a thin wrapper around that library) * no namespacing: seemingly arbitrary naming convention for symbols. Would want to add some namespace prefix to the public symbols (vars and fns), and to the types. * JIT compilation vs AOT compilation: when an app embeds a JIT it would want to assemble a more appropriate set of compilation passes dynamically, I suspect, compared to the full set of AOT passes that GCC currently has. * plugin architecture: how would plugins relate to this rearchitected GCC? * option-handling is currently very tied to the command line. Multiple uses of the library might want different option sets. * threading? potentially one could punt this by adding a "Big Compiler Lock": one big mutex guarding all usage of the APIs. * the political will to do this * actually implementing the thing! It seems that GCC has provided an API for registering plugins, but no API for the plugins to then actually use... Perhaps the C++ move would be alleviated by having an actually C API for plugins to use? I started writing a possible API for plugins, the idea being to port my python plugin to this as a middle layer, but it strikes me that this could also be used for the embedding case as well. So perhaps the first step might be to implement the plugin API, and that could evolve into the inter-library API that the different parts of a more modular GCC could use to talk to each other? The proposed API might look like this: /* Pure C for maximum compatibility All macros begin with a "GCC_" prefix All symbols begin with a "gcc_" prefix with _ separators, though I happen to prefer the CPython style (e.g. "GccBasicBlock_GetIndex"); bikeshed away! (You may only call such a symbol when you have the Big GCC Lock?) All types begin with a "gcc_" prefix (again, I'd prefer CPython style e.g. "struct GccBasicBlock"). How acceptable is it to autogenerate parts of the API? (this is what I do in my python plugin; naturally I use python for this). */ /* Compatibility macros: */ #define GCC_API(RETURN_TYPE) extern RETURN_TYPE /* All types are opaque; internally these might simply embed one of gcc's real types as its single field; integration with GC could be interesting though: */ typedef struct gcc_cfg gcc_cfg; typedef struct gcc_basic_block gcc_basic_block; typedef struct gcc_edge gcc_edge; /* Declarations: control flow graphs */ /* gcc_cfg: */ GCC_API(gcc_basic_block *) gcc_cfg_get_entry(gcc_cfg *cfg); GCC_API(gcc_basic_block *) gcc_cfg_get_exit(gcc_cfg *cfg); /* gcc_basic_block: */ GCC_API(int) gcc_basic_block_get_index(const gcc_basic_block *bb); /* gcc_edge: */ GCC_API(gcc_basic_block *) gcc_edge_get_src(gcc_edge *e); GCC_API(gcc_basic_block *) gcc_edge_get_dest(gcc_edge *e); /* ...etc... */ Again, these opinions are my own, and not those of Red Hat. Hope this is constructive. Dave