Aye up Basile,
Thanks for wading through my gibberish :-)
*Differences with other proposals.*
I'll have a stab at some differences between this system and the
others. But, this is going to be a bit difficult since I haven't seen
them all :-)
*Separating Plugin system from appliction*
Libplugin ships as a library. Apart from a few lines of code in
toplev.c, the only other changes to GCC will be refactorings and
maybe calling a few functions through pointers.
I think it's important to separate the plugin system from the
application. Doing plugins well, IMO, requires a lot of code. It
shouldn't be spread through the app. It also cleanly separates
plugin mechanism from the actual extensions the app wants.
Finally, plugins have to be extensible too. They should really be on
a nearly equal footing with the app. Otherwise plugin developers
who want the plugins to be extensible will need to reimplement there
own extensibility system.
*Pull vs push*
Libplugin has a 'push' architecture, not a 'pull' one. What I mean
is that the system pushes plugin awareness onto the application
rather than requiring the application to call out to the plugin
system all the time.
Here's an example of that. In GCC, passes have execute and gate
functions which are already function pointers. With libplugin you
can make these replaceable/extensible/event-like without changing a
single line of code in GCC.
An external plugin, the "gcc-pass-manager" plugin, tells the world
that it has a join point for each gate and execute function of every
pass in the system.
A quick aside on join points. Suppose you have a function
int myHeuristic( basic_block bb, rtx insn ) {
// blah, blah
return x;
}
If we redefine that function to be called myHeuristic_default
and setup a function pointer with same name:
static int myHeuristic_default( basic_block bb, rtx insn ) {
... }
int ( *myHeuristic )( basic_block bb, rtx insn ) =
myHeuristic_default;
Now we can use the heuristic unchanged in the code.
But if we tell libplugin that that is a join point with
id="my-heuristic" (in the XML for some plugin) it will create
1. An event called "my-heuristic.before" with signature
"void (basic_block, rtx)"
2. A replaceble function stack called "my-heuristic.around"
with signature "int (basic_block, rtx)"
3. An event called "my-heuristic.after" with signature "void
(int, basic_block, rtx)"
If anyone extends any of those, then the function pointer,
myHeuristic, will be replaced with a dynamically built function
which does, roughly:
int myHeuristic_dynamic( basic_block bb, rtx insn ) {
// call listeners to before
foreach f in my-heuristic.before.eventHandlers {
f( bb, insn );
}
// do the behaviour of the heuristic
top = my-heuristic.around.topOfAdviceStack;
// top is initially myHeuristic_default unless someone
overrode it
// top can also access the rest of the advice stack, but
I ignore that here
int rval = top( bb, insn );
// call listeners to after
foreach f in my-heuristic.after.eventHandlers {
f( rval, bb, insn );
}
return rval
}
It then sets myHeuristic = myHeuristic_dynamic. Note that if no
one listens to the events of pushes advice on the around stack,
then the original function pointer isn't changed - no
performance cost.
Now the dynamic functions are pushed onto each passes' gate or
execute only if someone wants to extends them. Not one line of code
was changed in GCC. This is what I mean by push not pull.
Consider the alternative, which I call 'pull' because it has to pull
plugin awareness from the system. It would require each pass and
gate to check if anyone was interested, lots of changes to the
code. Or every calling site would have to do it, similarly
unpleasant for most uses.
This is great when you already have function pointers. If you don't
you have to make only minimal changes. Your code remains efficient
if no one extends it.
*Scalable and Granularity*
The system is very scalable. Really this is due to the push
architecture.
Consider if events were implemented by something like this. A
single function:
void firePluginEvent( int eventId, void* data );
Every event would be fired by calling through this one function.
Plugins would register a callback function.
This is fine when you only have a few events but look what happens
when you have very fine grained events happening millions of times
during the compilation.
Firstly, most plugins are only going to be interested in a few of
the events. Why ask them to do the filtering when you don't need
to. Each plugin, then pays the cost.
Secondly, for each event firing we have one callback for each
plugin, even if there are no plugins interested in this event. If
the event was SCHEDULE_INSN, or more frequent that might be bit painful.
One final point about that style is that now you have to define a
struct for your data and your code looks less like what you meant.
With libplugin, on the other hand, you only pay where you have to.
If no one uses an event then no listeners are added to it. Also if
a plugin only listens to one event, that's the only event it'll ever
have to hear about. This way you can afford have very fine grained
extension points.
Because it's scalable, we can have very frequently called events or
heuristics be replaceable. You can have any granularity you want.
*Mutliple cooperating plugins
*I think some of the proposals don't allow multiple plugins or
plugins aren't able to be extended in the same way that the
application is. In libplugin you can have lots of plugins all
depending on each other. Plugins can provide extension points as
well as the application - this means it isn't just a matter of the
application deciding what's important and everyone else having to
make do.
In some senses, this is the difference between a plugin system and
loading a few shared libraries. A plugin system provides an
holistic framework for building and using plugins.
*
Separation of plugins from shared libraries*
If a plugin is only a shared library, then you have to load that
library to find out anything about it. For example, if isn't for
your particular application/version then that might cause problems.
To add any metadata you have to do odd things, like making the path
to the libary be meaningful - it might work for one bit of metadata
but quickly becomes difficult.
The other problem with plugins being only shared libraries is that
you always have to write C code. What if you just want something
really simple? Do you always have to go to the effort of writing a
shared library, compiling it, putting it in the right place, and so on?
Libplugin, though, makes every plugin an XML file, not a shared
library. The plugin might use a shared library but it might not.
It can also use more than one.
This way we gain several advantages. Firstly, meta data is easy, it
goes in the xml file. This means that we can have required and not
required plugins, plugins for one application/version not another,
well, all sorts of stuff. And for none of it do we have to load any
shared libraries.
It also means that the user can get lots of use out of the plugin
system without writing a single line of C code.
Here's an example. I use my plugin system mostly for my iterative
compilation/machine learning experiments. I write a plugin for each
element of the iteration space - each such plugin defines the
compilation strategy for the particular point in the space. Here's
one of those files (annotated and modded a bit):
<?xml version='1.0'?>
<!-- Only work on GCC 4.3.X -->
<?gcc version='4.3'?>
<!-- A plugin that's only loaded if requested (lazy) -->
<plugin id='strategy' lazy='true'>
<!-- Bring in another plugin -->
<!-- I want this plugin to print out each loop as it's
unrolled - for debugging -->
<!-- Actually this plugin doesn't have any C code either, it
uses -->
<!-- plugin "message" to do logging -->
<!-- plugin "gcc-rtl-unroll-and-peel-loops" to log the
particular event -->
<requires plugin="gcc-print-unrolled-loops"/>
<!-- Another one for debugging. Also has no C code. -->
<!-- This one prints every pass and the current function to
an XML file -->
<requires plugin="gcc-print-passes"/>
<!-- I need to change the command line. I don't want to
mess with any benchmark's -->
<!-- makefile, and fortunately the "command-line" plugin
lets me change arguments -->
<!-- easily. -->
<extension point="command-line.modify">
<!-- First remove any args matching -O*, -finline* or
-funroll* -->
<remove><arg>-O*</arg></remove>
<remove><arg>-finline*</arg></remove>
<remove><arg>-funroll*</arg></remove>
<!-- Add arguments -O3, -funroll-all-loops, -fno-inline -->
<insert><arg>-O3</arg><arg>-funroll-all-loops</arg><arg>-fno-inline</arg></insert>
</extension>
<!-- I want to record the number of cycles from each
function -->
<!-- Plugin "gcc-perfmon" does that -->
<extension point="gcc-perfmon.settings">
<!-- I don't want to include functions input_dsp or
output_dsp, they're just a bunch -->
<!-- of I/O setup stuff. -->
<instrument>
<exclude function="*put_dsp"/>
</instrument>
<!-- I don't want to pause the count around some
functions -->
<pause>
<exclude function="acos"/>
...
<exclude function="vsprintf"/>
</pause>
</extension>
<!-- Change loop unroll factors -->
<!-- This replaces the default heuristic with one which
takes values from this file -->
<!-- We could also have replaced the heuristic ourselves,
but then we'd have to have -->
<!-- a C function to replace it with. -->
<extension point='gcc-rtl-unroll-and-peel-loops.override'>
<!-- For loop "adpcm.c/adpcm_decoder/1" unroll it 5
times - work out the best type for me -->
<loop main-input-file='adpcm.c' function='adpcm_decoder'
number='1' times='5'/>
<!-- For loop "rawaudio.c/main/1" unroll it 2 times -
work out the best type for me -->
<loop main-input-file='rawdaudio.c' function='main'
number='1' times='2'/>
<!-- For everything else, leave it to GCC (could remove
this, it's the default) -->
<loop main-input-file='*' function='*' number='*'
times='gcc-default'/>
</extension>
</plugin>
So, I think you can see that you can do quite a lot without having
to have a shared library. As plugins become more capable you should
be able to do more and more without writing any C code.
*Passes*
Here's a quick description about what happens with passes. I've split
the discussion in two. The first, short part describes what will go
into the next release, in mid-October(ish). The next part is working
but I'm not happy with it yet, so it will wait.
*Current gcc-pass-manager
*This plugin provides a number of things. First, as discussed
already, there are join points for every pass' gate and execute
function. This allows you turn passes on or off, find out what
happened or to completely change the behaviour of a pass. BTW, the
pass manager also creates names for those passes which don't already
have them.
There are also join points around execute_one_pass and
execute_one_ipa_transform_pass (I'm still on 4.3.1). These allow
you to find out what happened to each pass, rather than having to
listen to the events of individual passes. You can also change the
way those functions work.
*Next gcc-pass-manager*
Also allows you to add passes. First, you can just add to the
managed passes without putting a pass into the compilation order.
<extension point="gcc-pass-manager.add-pass">
<pass symbol="pass-symbol-in-shared-lib"/>
</extension>
Or you can add one after or before another pass. At the moment this
happens only to the first occurence of the other pass. This is one
thing I don't like.
<extension point="gcc-pass-manager.insert-pass"
after="pass-name"> <!-- or before="pass-name" -->
<pass symbol="pass-symbol-in-shared-lib"/>
<!-- or if already registered -->
<pass name="pass-name"/>
</extension>
You can also remove passes - again I'm not happy with this yet.
<extension point="gcc-pass-manager.remove-pass" name="pass-name"/>
The above control the default pass ordering. You can also set up
particular pass orders for certain functions. I'm still not happy
with it and it doesn't do IPA passes (though I think I can handle that).
<extension point="gcc-pass-manager.set-pass-order">
<!-- Specify which functions to set pass order for -->
<!-- Glob patterns can be used -->
<function main-input-file="glob-pattern" name="glob-pattern">
<!-- Do all the default passes until a given pass name -->
<default to="pass-name"/>
<!-- Do some particular passes -->
<pass name="pass-name"/>
<pass name="pass-name"/>
...
<!-- Do a bunch of passes from the default pass order -->
<default from="pass-name" to="pass-name"/>
...
<!-- Do some particular passes -->
<pass name="pass-name"/>
<pass name="pass-name"/>
<!-- Do passes from a given pass to the end of the
compilation -->
<default from="pass-name"/>
<function>
<!-- If you're thinking that writing that list for each -->
<!-- function (if there's no good glob pattern) is going -->
<!-- to be painful, you'd be right. Except that we use -->
<!-- XInclude, too, so you can just repeatedly include -->
<!-- the pass list from another file -->
</extension>
The things I'm not happy with are due to the abillity to have
multiple copies of a pass in the pass tree. The other is the tree
flattening I do for extension point
gcc-pass-manager.set-pass-order. I need to think about it for a while.
Note that the above XML format is for convenience. You could write
your own code and replace how passes are done completely if you want.
*Licensing*
I don't know anything about licensing, but we could do something
similar to the approach that Joern suggested. We could only load
plugins that included the GPL or other approved OSS lisence at the
top of the file. The plugin would then declare that it and
everything it used was good. I don't think people could avoid that
declaration. Maybe I'm wrong.
What do you all think? Is this interesting?
Cheers,
Hugh.
Basile STARYNKEVITCH wrote:
Hugh Leather wrote:
Aye up all,
I've now been reading through some of the list archive. Some of the
posts were about how to tell GCC which plugins to load. I thought
I'd tell you how libplugin does it.
Thanks for the nice explanation. I'm not sure to understand exactly
how libplugin deals with adding passes; apparently, the entire pass
manager (ie gcc/passes.c) has been rewritten or enhanced. Also, I did
not understood the exact conceptual differences between libplugin &
other proposals. Apparently libplugin is much more ambitious.
So we now have many plugin proposals & experiments. However, we do
know that there are some legal/political/license issues on these
points (with the GCC community rightly wanting as hard as possible to
avoid proprietary plugins), that some interaction seems to happen
(notably between Steering Committee & FSF), that the work is going
slowly (because of lack of resource & labor & funding? at FSF).
My perception is that the issues are not mostly technical, but still
political (and probably, as Ian Taylor mentioned it in
http://gcc.gnu.org/ml/gcc/2008-09/msg00442.html a lack of lawyer or
other human resources at FSF, which cost much more than any reasonable
person could afford individually). I actually might not understand why
exactly plugins are not permitted by the current GCC licenses.
What I don't understand is
* what exactly do we call a plugin? I feel (but I am not a lawyer)
that (on linux) it is any *.so file which is fed to dlopen. I'm not
able to point what parts of the GCC license prohibit that (I actually
hope that nothing prohibits it right now, if the *.so is compiled from
GPLv3-ed FSF copyrighted code. the MELT branch is doing exactly that
right now).
* will the runtime license be working for Christmas 2008. [some
messages made me think that not, it is too much lawyer work; other
messages made me a bit more optimistic; I really am confused]. Of
course, I don't want any hard date, but I am in the absolute darkness
on the actual work already done on improving the runtime license, and
even more on what needs to be fixed. Also, I have no idea of the work
involved in writing new licenses (I only know that the GPLv3 effort
lasted much more than one year). Did I say that I am not a lawyer, and
not understanding even the basic principles of US laws (or perhaps
even French ones)?
* what kind of intrusiveness do we want for the plugin machinery. Do
we want it to be clean and hence to touch a lot of files (in
particular the details of passes & the pass manager), or do we first
want some quick and dirty plugin trick merged into the trunk, even if
it is imperfect?
* what is the plugin machinery useful for? Only adding optimisation
passes, or much more ambitious (adding new front ends, back ends,
targets)?
* what is the interaction between the plugin machinery & the rest of
GCC (e.g. GGC, dump files, )
* what is the granularity plugins are wanted or needed for? Only whole
passes, or something smaller than that (e.g. some specific functions
inside specific passes)?
* who really want plugins to happen quick, and which company would
invest money [not only code] on that?
* what host system do we want the plugin to work with? Is libtool dyn
loader enough? Could every non static symbol inside cc1 be visible to
the plugin?
* do we really want one single (fits all) plugin machinery inside GCC?
My feeling is that a lot of various technical efforts has already
being put into plugins, but that the future runtime license may (or
not) impact technicalities (perhaps making some proposed technical
solutions impossible). I really don't understand what is the hard
limit, i.e. what the FSF or the Steering Committee wants to avoid
exactly (obviously proprietary plugins implementing new machine
targets are unwanted, but what else; is the goal to only permit FSF
copyrighted GPLed plugins; what would be the review policy of code
going into plugins?)?
I've got no idea of how would it be hard to make any plugin system
accepted into the GCC trunk, and when could that work begins to start
(i.e. when to send plugin patches to gcc-patches@). I tend to believe
that it the main issue now. Are plugin patches supposed to be welcome
-on the gcc-patches@ mailing list, for trunk acceptance- when GCC goes
back in stage1? Will the first plugin patches (submitted to
gcc-patches@ for acceptance into trunk) be huge or tiny patches?
Technically both are possible (of course with different goals &
features).
I even don't know what legally a plugin is. For instance, in my MELT
branch code is indeed dlopen-ed, but [currently] the C code of the
plugin is generated (by the plugin itself) from MELT lisp-like files,
which are all inside the MELT branch (GPL-ed, FSF copyrighted) Perhaps
that does not even count, from a legal point of view, as a plugin? [I
really hope I am not doing unknowingly illegal things on the MELT
branch; to calm everyone, of course every line of code there is GPLv3
licenced, FSF copyrighted - even generated code... so I hope that I am
not guilty... :-) ].
My guess is that the most visible effect of plugins could be perhaps a
tiny side effect: some code could be practically used in gcc, with GPL
licence (or LGPL?) inside GCC [since it is dlopen-ed] without being
FSF copyrighted, but perhaps the goal of the steering committee is to
avoid that.
And I even don't understand who is deciding what on the plugin issues
& the runtime license issue.
Regards.