Modules, Versioning, and Beyond

Bryan C . Warnock Sun, 29 Jul 2001 18:44:41 -0700
I've been contemplating this off and on for a while now, but the lack
of a Grand Unified Theory of Module Versioning has always led me to
shelve whatever thoughts I may have had to the back of my mind - or to
/dev/null.  There was a lot of peripheral discussion, however, here at
TPC, and a lot of similar discussion taking place over on p5p, so I'm going 
to go ahead and do a brain dump of a possible solution.  It is probably an 
undesirable one, but that's okay.  A bad idea is better than no idea (until 
it's actually implemented).

Larry wants more metadata attached to the modules - version, author,
and its threading capability were mentioned during the State of the
Onion; whether it's been tested or rated were mentioned elsewhere. 
It's been requested several times for an easier way to have multiple
versions of a particular module installed.  With code-freezing and
bytecode distribution, it may even be needed to be able to reference
multiple versions of a module from within a single program.  Of course,
the impact to CPAN and CPAN.pm will need to be considered, as would a
look at how it would (or could) bridge a previously installed Perl 5
module base.

It's been fairly well hinted that the 'package' keyword will be
deprecated in term of 'class' or 'module' (or both).  I'll continue to
use the Perl 5 term for clarity, or some semblance thereof. 
Currently, a 'module' (as one may find on CPAN) doesn't really
correspond with a 'module' (as one would call a .pm file), which
doesn't really correspond with a 'package'.  Multiple packages may be
found inside a .pm module, of which many may be associated within one
CPAN module.  (Many CPAN modules may also be found within a larger
bundle, but that doesn't actually come into play.)  Module versioning,
however, is currently based on the version of the actual package of the
same name as the CPAN module which resides in the .pm module of the
same name as the CPAN module, and is stored as a package variable. 
(Generally speaking.  As long as the package is created, it doesn't
really matter from what module.  And, of course, some authors don't
versionize at all.  But that's beside the point.)

Having Multiple Versions Of The Same Module

Currently, if I want to use someone else's code, I simply 'use' the
module. Perl then meanders through the current @INC to look for the
first version it can find.  Only after it has been loaded is the actual
version ($MODULE::VERSION) checked against your requested version, if
you specified one.  If the version doesn't meet the minimum
requirements, then your program dies.

Now, it would be much nicer if
    a) it could do the version checking *before* loading (and running)
       the module in question; and
    b) if the module wasn't of a sufficient version, that it would 
       continue down the @INC path in search of one that was.
       
(In the current scheme, the first precludes the second.)

Furthermore, when you are requesting a module (with or without a
version), what is it that you're really asking for?  In Perl 5:

    use Foo;     # Whatever Foo.pm happens to come first in my path
    use Foo 1.2; # I want at least version 1.2 of Foo.pm.
    
You can't specify a maximum version.  You can't specify an exact
version.  (The former would prevent loading of a module that in turn
requires a higher version of Perl than what you are calling it with. 
The latter truly specifies the exact version of a module you want - for
instance, one that is indeed known to work with the calling code,
regardless of whether or not it actually works.)

So let's revisit what a programmer is really looking for in a module
and its interface.

    a) I need a version of Foo.pm, but I don't care what.  (Actually, I
       do care - I need one that works - but what I'm actually assuming
       here is...)
    b) I need the version of Foo.pm (whatever version it happens to be)
       that appears first in my path.  This is great for module
       testing, etc.
    c) I need a version of Foo.pm (preferably the latest version) that
       meets a particular API that was promised to me.  I do not want
       the very latest version which changed a lot of the functions so
       that my program doesn't work.
    d) I need this exact version of Foo.pm, because I am violating the 
       gentleman's agreement and bypassing the API.  Therefore, I am
       dependent on the module's exact internals (with all its bugs and
       features).
       
Currently, there's no standardization on versioning or what particular
versions mean, amongst the modules.  1.21 may be a bug fix, 1.22 may
add an additional feature.  1.31 may take a feature away.  That
requires a lot of processing to meet our third requirement above - I
almost need a separate API version.

Which, of course, we can do.  Let's just arbitrarily assume that the
major number of the version is equivalent to that version of the API. 
(In other words, Foo 1.05 gives us a promise that it uses the same API
as 1.02 and 1.08.  Foo 2.01 would use a different (however slight) API
from 1.99 and 3.10.  Version 0.x of anything, however, is exempt from
this problem.  It is taken as API in flux, and caveat scriptor.)  We
can then make our various requests as such:

    use Foo;      # Find the first Foo.pm in @INC, and use that.  I
                  # don't care what it is.  If it ends up not working,
                  # tough.
    use Foo 1.03; # Find the first Foo.pm (that is exactly version
                  # 1.03) in @INC and load it in.
    use Foo 1;    # (Or perhaps 'use Foo 1.*') Find the first (or
                  # perhaps most recent version) of Foo.pm that adheres
                  # to its v1 API.
                  
One could choose to extend the syntax, I suppose, with ranges of
acceptable  versions, or comparative versions, so that one could say
something along these lines:

    use Foo 1.03 .. 1.47;
    
If you deferred loading, you could then use lexical analysis to figure
out what version really needs to be loaded in to satisfy any and all
calling modules, but this is getting extremely complex.  I also don't
want to be traipsing through *all* of @INC to find the file I'm looking
for.  Ideally, I don't want to be reading and sorting and such to
directories at all.  I want to specify the file, attempt to open it,
and move on down @INC, until I'm successful.  Ranges and the like
prohibit that, simply because anything comparative must have multiple
items to compare.  (Yes, I'm aware that 1.* is effectively the same
thing.  I'm getting to that.)

If you look how regular shared objects are named, they usually contain
within the filename some version information for the library itself -
at a minimum the specific version of an API.  On Linux, at least - and
I am not holding it up as a model, merely an example - they encode down
to the revision number.  Symbolic links or copies of the specific files
are used to provide direct access to the generic API version, or to the
latest version.

    libmylib.so@  ->  libmylib.so.2.1.2
    libmylib.so.2@  ->  libmylib.so.2.1.2
    libmylib.so.2.1.2*
    
That allows specific access against libmylib.so.2.1.2, API coherency
against libmylib.so.2, or generic access against libmylib.so.  Of
course, I don't need to tell you any of this - it's old hat.  Perl
could use something similar with

    Foo.pm@  ->  Foo.pm.1.02
    Foo.pm.1@  ->  Foo.pm.1.02
    Foo.pm.1.02*
    
Of course, with the longer names and multiple dots, you've certainly
got 8.3 issues.  For systems without an equivalent linking system
(symbolic or otherwise), multiple copies of the same module would need
to be kept.  You lose .pm as the file extension for platforms and
applications that do file characterizations based on file extensions. 

An alternate way of representing this is through a directory hierarchy,
vaguely reminiscent of one Perl uses currently.  We can abstract both
the API major version and the minor revision number to two layers of
directories.  So now Foo.pm above could be found as:

    perllib/Foo.pm@  ->  1/02/Foo.pm
    perllib/1/Foo.pm@  ->  02/Foo.pm
    perllib/1/02/Foo.pm
       
This solves the 8.3 issues, but seems to pose a potential NAME_LEN
problem when built upon an already lengthy path in @INC.  In truth, it
adds two levels and up to six characters.  For instance, one possible
location for Foo::Bar::Baz v1.39 would be:

/usr/local/lib/perl6/site_perl/6.0.0/i686-linux-2.4/Foo/Bar/1/39/Baz.pm

And if this were the newest versions, the following links:

/usr/local/lib/perl6/site_perl/6.0.0/i686-linux-2.4/Foo/Bar/Baz.pm@  ->
  1/39/Baz.pm 
/usr/local/lib/perl6/site_perl/6.0.0/i686-linux-2.4/Foo/Bar/1/Baz.pm@  ->
    39/Baz.pm
 
We can now check for the proper versions without loading the actual
files. In Perl, it would be something like this (assuming, of course,
you could do something like this):

sub use ($$) {
    my ($module,$version) = @_;
    $module =~ s#::#/#g;
    $module ~= '.pm'; 
    if (defined $version) {
        if ($version =~ /^(\d+)(?:\.(\d+))?/) {
            $module = "$1/" ~ ($2 ? "$2/" : "") ~ $module;
        } else {
            croak "Bad version $version\n";
        }
    } 
    foreach my $path ((exists %ENV{PERL6LIB}) 
                      ? split /:/, %ENV{PERL6LIB} 
                      : @INC) {
        if (-r "$path/$module" ) {
            # load, run, import(), etc.
            last;
        }
    }
}

One advantage to this scheme is that legacy Perl 5 modules can still be
found, as long as no versioning information is specified.  This also
makes it backwards compatible for Perl 5, and since this is relatively
straightforward, could be implemented now. CPAN and CPAN.pm would
require few changes (if any).  The module packaging utilities (which
produce the actual build and installation tools) would also need to be
patched to recognize the directory structure, identify and create links
to newer modules as necessary, and perhaps remove some older modules,
either automatically, or as specified by the module author. 

(This can currently be done (from a module loading perspective) from within 
Perl 5 now with source filtering.  (Cache the current value of @INC, append 
the version information to each path, require the file, and restore @INC.)  
I've also been looking at a possible source patch, but loading and version 
checking are completely separate beasties, so no quick solution immediately 
jumps out at me.)

Of course, Larry also mentioned that he would like to be able to refer
to modules by author, as well.  This, I'm hoping, I either
misunderstood, or falls into the 5%.  Not everyone outside the Perl
development community knows who writes what.  It's easy to take for
granted that everyone knows who Damian is - the truth is, most Perl
users don't, and having multiple implementations of identical modules
within the same namespace and having to differentiate by the author's
name (which may not match their PAUSE ID) is just too confusing for
most.

In addition, anything more than versioning will most likely also
require some sort of registry scheme to be able to evaluate the
modules.  This would need to be combined with any requisite path
information, and also any user registries, to find the file.  And
you'd still have to come up with a naming scheme to keep multiple
versions around on disk.  I don't see that as a win. 

There are, however, the issues of alpha and beta level code.  I've a
few ideas, but none so much better than any other.  I like the idea of
never linking to alpha or beta code abstractly - they must be referred
to directly.  This could be down within an /alpha/ or /beta/
sub-directory beneath the API version that they are premature code for.

When originally discussing this idea, it was expressed that changing
the major number for every insignificant API change would quickly lead
to version bloat.  Perhaps the fear of version bloat will allow for a
longer development cycle to stabilize the API, which would lead to
better design and code.  Of course, over on p5p, some are concerned about 
minors revs past 99, also, so who knows?

Using Multiple Versions Of The Same Module In One Program

There are a lot of reasons why this isn't a good idea, or even possible
in places.  Non-Perl code on the backend, for instance, will probably
prevent multiple versions from being linked in.  But since we're doing
more and more in Perl, I'll bring it up.  If you are already convinced
this is a Bad Idea, then feel free to skip this section.

Given:

perl.pl
    #!/usr/bin/perl6
    use Foo;
    use Bar;
        
Foo.pm
    package Foo;  # yes, Perl 5 terminology
    use Baz;

Bar.pm
    package Bar;
    use Baz;
    
Both Foo and Bar use the Baz module for support.  In this case, Baz was
loaded, compiled, and run within Foo.pm - the 'use' line within Bar.pm
was mostly a no-op.  But Perl 6 is expected to be freezable (as
bytecode) in compilation units, which is basically a chunk of code that
Perl reads in, compiles, and runs.  In the above case, Baz.pm would be
the first compilation unit created, followed by Foo.pm, Bar.pm, and
finally perl.pl.  (When compiling Bar.pm, it sees that Baz.pm has
already been loaded, so it uses the compilation unit that was already
loaded.)  This is all very much the way that Perl 5 does it.  If a
module attempts to load a specific version of another module after any
version had already been loaded, it is still stuck with the previously
loaded version.

In one sense, that's how it needs to be.  Perl is just too dynamic to
be able to trust a module to have multiple instances of itself within
one space - function and variable exporting, for instance - but let's
assume for a moment that we've got some gentleman's agreement in place
that says that we will keep our hands to ourselves.  Can we have
multiple instances?

Each compilation unit will have its own text and data sections,
potentially read-only, along with a fix-up section for handling the
proper offset computations, etc.  Given this, the physical data and
code structure of two versions of the same module won't interfere with
each other.  Of course, they would both be using the same namespace,
would they not?  Not necessarily.  We could take a lesson from the
signature-based function languages and handle transparent namespace
mangling, particularly if modules were lexically scoped within the
calling module (as has been hinted on occasion).  The compiler would
know that the Foo:: that I am referring to is *clearly* Foo__2_12 and
not Foo__2_30, but what havoc would that wreak on code and variable
introspection and other really cool tricks played with the symbol table
and such?  Is this a feature that needs to be supported? *Can* it be
supported?  To me, this seems more trouble that what it's worth, but I
have been thinking about it, so I thought I'd throw it out there.

Querying Module Metadata

We also have all this metadata, but how do we get to it?  I'm expecting
that Damian will say attributes, but I want to at least get out in the
open that we want to be able to query the metadata of a module we use
(or perhaps, one that we haven't used yet) from within our scripts, 
preferrably without having to add explicit code to do so.

GUIDs

Gurusamy, in his discussion about what he feels Perl 6 should be, proposed 
GUIDs (Global Unique IDs).  Although he didn't go into detail, GUIDs would 
be one way to encode both version and authoring details much like what Larry 
wanted.  But here we have the chicken-and-the-egg problem.  GUIDs are 
excellent for C2C communications, but horrible for H2C communications.  And 
you'd still need to derive some sort of structure for the modules to coexist 
on disk.

-- 
Bryan C. Warnock
[EMAIL PROTECTED]


-- 
Bryan C. Warnock
[EMAIL PROTECTED]
Modules, Versioning, and Beyond

Reply via email to