Re: Defining a common plugin machinery

Hugh Leather Wed, 01 Oct 2008 13:13:43 -0700

Aye up Basile,

   Thanks for wading through my gibberish :-)



*Differences with other proposals.*

I'll have a stab at some differences between this system and theothers. But, this is going to be a bit difficult since I haven't seenthem all :-)


   *Separating Plugin system from appliction*
   Libplugin ships as a library.  Apart from a few lines of code in
   toplev.c, the only other changes to GCC will be refactorings and
   maybe calling a few functions through pointers.

   I think it's important to separate the plugin system from the
   application.  Doing plugins well, IMO, requires a lot of code.  It
   shouldn't be spread through the app.  It also cleanly separates

plugin mechanism from the actual extensions the app wants.

   Finally, plugins have to be extensible too. They should really be on
   a nearly equal footing with the app.  Otherwise plugin developers
   who want the plugins to be extensible will need to reimplement there
   own extensibility system.


   *Pull vs push*
   Libplugin has a 'push' architecture, not a 'pull' one.  What I mean
   is that the system pushes plugin awareness onto the application
   rather than requiring the application to call out to the plugin
   system all the time.
   Here's an example of that.  In GCC, passes have execute and gate
   functions which are already function pointers.  With libplugin you
   can make these replaceable/extensible/event-like without changing a
   single line of code in GCC.
   An external plugin, the "gcc-pass-manager" plugin, tells the world
   that it has a join point for each gate and execute function of every
   pass in the system.

       A quick aside on join points.  Suppose you have a function

              int myHeuristic( basic_block bb, rtx insn ) {
                 // blah, blah
                 return x;
              }

       If we redefine that function to be called myHeuristic_default
       and setup a function pointer with same name:
          static int myHeuristic_default( basic_block bb, rtx insn ) {
       ... }
          int ( *myHeuristic )( basic_block bb, rtx insn ) =
       myHeuristic_default;

       Now we can use the heuristic unchanged in the code.

       But if we tell libplugin that that is a join point with
       id="my-heuristic" (in the XML for some plugin) it will create
           1. An event called "my-heuristic.before" with signature
       "void (basic_block, rtx)"
           2. A replaceble function stack called "my-heuristic.around"
       with signature "int (basic_block, rtx)"
           3. An event called "my-heuristic.after" with signature "void
       (int, basic_block, rtx)"
       If anyone extends any of those, then the function pointer,
       myHeuristic, will be replaced with a dynamically built function
       which does, roughly:

int myHeuristic_dynamic( basic_block bb, rtx insn ) {

              // call listeners to before
              foreach f in my-heuristic.before.eventHandlers {
                 f( bb, insn );
              }

              // do the behaviour of the heuristic
              top = my-heuristic.around.topOfAdviceStack;
              // top is initially myHeuristic_default unless someone
       overrode it
              // top can also access the rest of the advice stack, but
       I ignore that here
              int rval = top( bb, insn );

// call listeners to after

              foreach f in my-heuristic.after.eventHandlers {
                 f( rval, bb, insn );
              }
              return rval
           }

       It then sets myHeuristic = myHeuristic_dynamic.  Note that if no
       one listens to the events of pushes advice on the around stack,
       then the original function pointer isn't changed - no
       performance cost.

   Now the dynamic functions are pushed onto each passes' gate or
   execute only if someone wants to extends them.  Not one line of code

was changed in GCC. This is what I mean by push not pull.Consider the alternative, which I call 'pull' because it has to pull

   plugin awareness from the system.  It would require each pass and
   gate to check if anyone was interested, lots of changes to the
   code.  Or every calling site would have to do it, similarly
   unpleasant for most uses.

   This is great when you already have function pointers.  If you don't
   you have to make only minimal changes.  Your code remains efficient
   if no one extends it.

   *Scalable and Granularity*
   The system is very scalable.  Really this is due to the push
   architecture.

   Consider if events were implemented by something like this.  A
   single function:

       void firePluginEvent( int eventId, void* data );

Every event would be fired by calling through this one function.Plugins would register a callback function.


   This is fine when you only have a few events but look what happens
   when you have very fine grained events happening millions of times
   during the compilation.
   Firstly, most plugins are only going to be interested in a few of
   the events.  Why ask them to do the filtering when you don't need
   to.  Each plugin, then pays the cost.
   Secondly, for each event firing we have one callback for each
   plugin, even if there are no plugins interested in this event.  If
   the event was SCHEDULE_INSN, or more frequent that might be bit painful.

   One final point about that style is that now you have to define a
   struct for your data and your code looks less like what you meant.

With libplugin, on the other hand, you only pay where you have to.If no one uses an event then no listeners are added to it. Also if

   a plugin only listens to one event, that's the only event it'll ever
   have to hear about.  This way you can afford have very fine grained
   extension points.

   Because it's scalable, we can have very frequently called events or
   heuristics be replaceable.  You can have any granularity you want.

   *Mutliple cooperating plugins
   *I think some of the proposals don't allow multiple plugins or
   plugins aren't able to be extended in the same way that the
   application is.  In libplugin you can have lots of plugins all
   depending on each other.  Plugins can provide extension points as
   well as the application - this means it isn't just a matter of the
   application deciding what's important and everyone else having to
   make do.

   In some senses, this is the difference between a plugin system and
   loading a few shared libraries.  A plugin system provides an
   holistic framework for building and using plugins.
   *
   Separation of plugins from shared libraries*
   If a plugin is only a shared library, then you have to load that
   library to find out anything about it.  For example, if isn't for

your particular application/version then that might cause problems.To add any metadata you have to do odd things, like making the path

   to the libary be meaningful - it might work for one bit of metadata
   but quickly becomes difficult.

   The other problem with plugins being only shared libraries is that
   you always have to write C code.  What if you just want something
   really simple?  Do you always have to go to the effort of writing a
   shared library, compiling it, putting it in the right place, and so on?

   Libplugin, though, makes every plugin an XML file, not a shared

library. The plugin might use a shared library but it might not.It can also use more than one.


   This way we gain several advantages.  Firstly, meta data is easy, it
   goes in the xml file.  This means that we can have required and not
   required plugins, plugins for one application/version not another,
   well, all sorts of stuff.  And for none of it do we have to load any
   shared libraries.

   It also means that the user can get lots of use out of the plugin
   system without writing a single line of C code.

   Here's an example.  I use my plugin system mostly for my iterative
   compilation/machine learning experiments.  I write a plugin for each
   element of the iteration space - each such plugin defines the
   compilation strategy for the particular point in the space.  Here's
   one of those files (annotated and modded a bit):

       <?xml version='1.0'?>

       <!-- Only work on GCC 4.3.X -->
       <?gcc version='4.3'?>

       <!-- A plugin that's only loaded if requested (lazy) -->
       <plugin id='strategy' lazy='true'>

           <!-- Bring in another plugin -->
           <!-- I want this plugin to print out each loop as it's
       unrolled - for debugging -->
           <!-- Actually this plugin doesn't have any C code either, it
       uses -->
           <!--    plugin "message" to do logging -->
           <!--    plugin "gcc-rtl-unroll-and-peel-loops" to log the
       particular event -->
           <requires plugin="gcc-print-unrolled-loops"/>

           <!-- Another one for debugging.  Also has no C code. -->
           <!-- This one prints every pass and the current function to
       an XML file -->
           <requires plugin="gcc-print-passes"/>

<!-- I need to change the command line. I don't want to

       mess with any benchmark's -->
           <!-- makefile, and fortunately the "command-line" plugin
       lets me change arguments -->
           <!-- easily. -->
           <extension point="command-line.modify">
               <!-- First remove any args matching -O*, -finline* or
       -funroll* -->
               <remove><arg>-O*</arg></remove>
               <remove><arg>-finline*</arg></remove>
               <remove><arg>-funroll*</arg></remove>

<insert><arg>-O3</arg><arg>-funroll-all-loops</arg><arg>-fno-inline</arg></insert>

           </extension>

<!-- I want to record the number of cycles from each

       function -->
           <!-- Plugin "gcc-perfmon" does that -->
           <extension point="gcc-perfmon.settings">
               <!-- I don't want to include functions input_dsp or
       output_dsp, they're just a bunch -->
               <!-- of I/O setup stuff. -->
               <instrument>
                   <exclude function="*put_dsp"/>
               </instrument>

               <!-- I don't want to pause the count around some
       functions -->
               <pause>
                   <exclude function="acos"/>
                   ...
                   <exclude function="vsprintf"/>
               </pause>
           </extension>

           <!-- Change loop unroll factors -->
           <!-- This replaces the default heuristic with one which
       takes values from this file -->
           <!-- We could also have replaced the heuristic ourselves,
       but then we'd have to have -->
           <!-- a C function to replace it with. -->
           <extension point='gcc-rtl-unroll-and-peel-loops.override'>
               <!-- For loop "adpcm.c/adpcm_decoder/1" unroll it 5
       times - work out the best type for me -->
               <loop main-input-file='adpcm.c' function='adpcm_decoder'
       number='1' times='5'/>

               <!-- For loop "rawaudio.c/main/1" unroll it 2 times -
       work out the best type for me -->
               <loop main-input-file='rawdaudio.c' function='main'
       number='1' times='2'/>

               <!-- For everything else, leave it to GCC (could remove
       this, it's the default) -->
               <loop main-input-file='*' function='*' number='*'
       times='gcc-default'/>
           </extension>
       </plugin>

   So, I think you can see that you can do quite a lot without having
   to have a shared library.  As plugins become more capable you should
   be able to do more and more without writing any C code.

*Passes*

Here's a quick description about what happens with passes. I've splitthe discussion in two. The first, short part describes what will gointo the next release, in mid-October(ish). The next part is workingbut I'm not happy with it yet, so it will wait.


   *Current gcc-pass-manager
   *This plugin provides a number of things.  First, as discussed
   already, there are join points for every pass' gate and execute
   function.  This allows you turn passes on or off, find out what
   happened or to completely change the behaviour of a pass.  BTW, the
   pass manager also creates names for those passes which don't already
   have them.

   There are also join points around execute_one_pass and
   execute_one_ipa_transform_pass (I'm still on 4.3.1).  These allow
   you to find out what happened to each pass, rather than having to
   listen to the events of individual passes.  You can also change the
   way those functions work.

   *Next gcc-pass-manager*
   Also allows you to add passes.  First, you can just add to the
   managed passes without putting a pass into the compilation order.
       <extension point="gcc-pass-manager.add-pass">
           <pass symbol="pass-symbol-in-shared-lib"/>
       </extension>

   Or you can add one after or before another pass.  At the moment this
   happens only to the first occurence of the other pass.  This is one
   thing I don't like.
       <extension point="gcc-pass-manager.insert-pass"
   after="pass-name"> <!-- or before="pass-name" -->
           <pass symbol="pass-symbol-in-shared-lib"/>
           <!-- or if already registered -->
           <pass name="pass-name"/>
       </extension>

   You can also remove passes - again I'm not happy with this yet.
       <extension point="gcc-pass-manager.remove-pass" name="pass-name"/>

   The above control the default pass ordering.  You can also set up
   particular pass orders for certain functions.  I'm still not happy
   with it and it doesn't do IPA passes (though I think I can handle that).

       <extension point="gcc-pass-manager.set-pass-order">
            <!-- Specify which functions to set pass order for -->
            <!-- Glob patterns can be used -->
            <function main-input-file="glob-pattern" name="glob-pattern">
                <!-- Do all the default passes until a given pass name -->
                <default to="pass-name"/>
                <!-- Do some particular passes -->
                <pass name="pass-name"/>
                <pass name="pass-name"/>
                ...
                <!-- Do a bunch of passes from the default pass order -->
                <default from="pass-name" to="pass-name"/>
                ...
                <!-- Do some particular passes -->
                <pass name="pass-name"/>
                <pass name="pass-name"/>
                <!-- Do passes from a given pass to the end of the
   compilation -->
                <default from="pass-name"/>
            <function>

            <!-- If you're thinking that writing that list for each -->
            <!-- function (if there's no good glob pattern) is going -->
            <!-- to be painful, you'd be right.  Except that we use -->
            <!-- XInclude, too, so you can just repeatedly include -->
            <!-- the pass list from another file -->
       </extension>

   The things I'm not happy with are due to the abillity to have
   multiple copies of a pass in the pass tree.  The other is the tree
   flattening I do for extension point
   gcc-pass-manager.set-pass-order.  I need to think about it for a while.

   Note that the above XML format is for convenience.  You could write
   your own code and replace how passes are done completely if you want.

*Licensing*

   I don't know anything about licensing, but we could do something
   similar to the approach that Joern suggested.  We could only load
   plugins that included the GPL or other approved OSS lisence at the
   top of the file.  The plugin would then declare that it and
   everything it used was good.  I don't think people could avoid that
   declaration.  Maybe I'm wrong.

What do you all think? Is this interesting?

   Cheers,

   Hugh.


Basile STARYNKEVITCH wrote:

Hugh Leather wrote:
Aye up all,
I've now been reading through some of the list archive. Some of theposts were about how to tell GCC which plugins to load. I thoughtI'd tell you how libplugin does it.
Thanks for the nice explanation. I'm not sure to understand exactlyhow libplugin deals with adding passes; apparently, the entire passmanager (ie gcc/passes.c) has been rewritten or enhanced. Also, I didnot understood the exact conceptual differences between libplugin &other proposals. Apparently libplugin is much more ambitious.
So we now have many plugin proposals & experiments. However, we doknow that there are some legal/political/license issues on thesepoints (with the GCC community rightly wanting as hard as possible toavoid proprietary plugins), that some interaction seems to happen(notably between Steering Committee & FSF), that the work is goingslowly (because of lack of resource & labor & funding? at FSF).
My perception is that the issues are not mostly technical, but stillpolitical (and probably, as Ian Taylor mentioned it inhttp://gcc.gnu.org/ml/gcc/2008-09/msg00442.html a lack of lawyer orother human resources at FSF, which cost much more than any reasonableperson could afford individually). I actually might not understand whyexactly plugins are not permitted by the current GCC licenses.
What I don't understand is
* what exactly do we call a plugin? I feel (but I am not a lawyer)that (on linux) it is any *.so file which is fed to dlopen. I'm notable to point what parts of the GCC license prohibit that (I actuallyhope that nothing prohibits it right now, if the *.so is compiled fromGPLv3-ed FSF copyrighted code. the MELT branch is doing exactly thatright now).
* will the runtime license be working for Christmas 2008. [somemessages made me think that not, it is too much lawyer work; othermessages made me a bit more optimistic; I really am confused]. Ofcourse, I don't want any hard date, but I am in the absolute darknesson the actual work already done on improving the runtime license, andeven more on what needs to be fixed. Also, I have no idea of the workinvolved in writing new licenses (I only know that the GPLv3 effortlasted much more than one year). Did I say that I am not a lawyer, andnot understanding even the basic principles of US laws (or perhapseven French ones)?
* what kind of intrusiveness do we want for the plugin machinery. Dowe want it to be clean and hence to touch a lot of files (inparticular the details of passes & the pass manager), or do we firstwant some quick and dirty plugin trick merged into the trunk, even ifit is imperfect?
* what is the plugin machinery useful for? Only adding optimisationpasses, or much more ambitious (adding new front ends, back ends,targets)?
* what is the interaction between the plugin machinery & the rest ofGCC (e.g. GGC, dump files, )
* what is the granularity plugins are wanted or needed for? Only wholepasses, or something smaller than that (e.g. some specific functionsinside specific passes)?
* who really want plugins to happen quick, and which company wouldinvest money [not only code] on that?
* what host system do we want the plugin to work with? Is libtool dynloader enough? Could every non static symbol inside cc1 be visible tothe plugin?
* do we really want one single (fits all) plugin machinery inside GCC?
My feeling is that a lot of various technical efforts has alreadybeing put into plugins, but that the future runtime license may (ornot) impact technicalities (perhaps making some proposed technicalsolutions impossible). I really don't understand what is the hardlimit, i.e. what the FSF or the Steering Committee wants to avoidexactly (obviously proprietary plugins implementing new machinetargets are unwanted, but what else; is the goal to only permit FSFcopyrighted GPLed plugins; what would be the review policy of codegoing into plugins?)?
I've got no idea of how would it be hard to make any plugin systemaccepted into the GCC trunk, and when could that work begins to start(i.e. when to send plugin patches to gcc-patches@). I tend to believethat it the main issue now. Are plugin patches supposed to be welcome-on the gcc-patches@ mailing list, for trunk acceptance- when GCC goesback in stage1? Will the first plugin patches (submitted togcc-patches@ for acceptance into trunk) be huge or tiny patches?Technically both are possible (of course with different goals &features).
I even don't know what legally a plugin is. For instance, in my MELTbranch code is indeed dlopen-ed, but [currently] the C code of theplugin is generated (by the plugin itself) from MELT lisp-like files,which are all inside the MELT branch (GPL-ed, FSF copyrighted) Perhapsthat does not even count, from a legal point of view, as a plugin? [Ireally hope I am not doing unknowingly illegal things on the MELTbranch; to calm everyone, of course every line of code there is GPLv3licenced, FSF copyrighted - even generated code... so I hope that I amnot guilty... :-) ].
My guess is that the most visible effect of plugins could be perhaps atiny side effect: some code could be practically used in gcc, with GPLlicence (or LGPL?) inside GCC [since it is dlopen-ed] without beingFSF copyrighted, but perhaps the goal of the steering committee is toavoid that.
And I even don't understand who is deciding what on the plugin issues& the runtime license issue.
Regards.

Re: Defining a common plugin machinery

Reply via email to