First, my thanks to Leo and Chip for their ongoing work to the
Parrot calling conventions.  I'm looking forward to the outcome.

As of the leo-ctx5 branch, however, I find I'm confused about
what the new conventions are supposed to achieve, and indeed
even how they are going to work when they're finished.  So, this
is going to be a longish message with my initial confusions and
observations about the new calling conventions, in the hopes of
clearing things up a bit.

First, let me provide some context -- in my career programmed in a 
lot of different languages and environments, including Perl, C, shells,
awk, C++, Java, PHP, and Pascal, as well as 68000, 6809, Z-80, IBM/360, 
and VAX-11 assembly languages.  So, in learning the new Parrot calling
conventions I find I'm trying to map them into something I'm familiar
with, and having a lot of trouble doing that.  Perhaps Parrot just
doesn't fit well into any of those models.

Also, I've been working under the assumption that it's generally
better (faster/more efficient) for frequently used integer and
string values to be held and manipulated in I and S registers rather 
than PMCs.  Perhaps this assumption is invalid.  Indeed, it's possible
that I'm working under a number of incorrect assumptions about how to
do things in Parrot, so if anyone can think of a better way to 
approach the things that PGE is trying to do, please feel free to
speak up.

With that background out of the way; let me give some examples
of confusion and/or difficulties I'm encountering in moving PGE
to the new calling conventions as implemented in leo-ctx5 (and
as discussed today on #parrot).  My questions and comments below
are formed from today's experience and conversations; it's
entirely possible I'm missing the bigger picture, or that what
I'm finding is all based on a misconception somewhere.

First, from reading pdd03 I was under the impression that
parameter passing was going to end up purely positional --
that each argument would automatically be converted into the
type required by the corresponding target register.  This is
the way I interpreted the words "standard conversion" in pdd03:

    =head3 Type Conversions

    Unlike the C<set_*> opcodes, the C<get_*> opcodes must perform
    conversion from one register type to another.  Here are the conversion
    rules:

    * When the target is an I, N, or S register, storage will 
      behave like an C<assign> (standard conversion).

However, according to Leo's journal, the I, N, and S registers 
do not convert among themselves but instead perform type checking
and throw an exception on mismatches.  (cf. "Type checking"
at http://use.perl.org/~leo/journal/25491.)  

For subroutines that have fixed parameter lists, perhaps this
is desirable.  (Note, however, that I come from a Perl background, 
where autoconversion of values is the norm.)  It does give me 
pause that PIR will autoconvert arguments to and from PMCs but 
not other registers -- i.e.:

    .sub "foo"                         .sub "bar"
        .param pmc arg1                    .param string arg1
        print arg1                         print arg1
    .end                               .end

    foo(1)     # valid                 bar(1)     # type mismatch 

I suppose if we consider pmcs to be a "universal type" and
registers to be "restricted types" this makes sense, but in either
case above I have to write extra statements somewhere to get an 
integer argument into a string register in the subroutine.

But my real confusion comes from the handling of "optional" parameters
in subroutine calls.  PGE uses a number of subroutine calls with 
optional parameters -- I think that this results in code that is
more readable and maintainable, and also can avoid any overhead 
from setting up and passing lots of "dummy" arguments that aren't
going to be used in a particular subroutine invocation anyway.

One of the PGE functions that has trouble with the new calling
conventions is emit(), which is used to generate the PIR instructions 
corresponding to a rule being compiled.  emit() was designed to 
look and act a lot like C's sprintf(), in that it uses %d and %s as 
placeholders for values to be substituted in a string on output:

    emit(code, "if rep == %d goto %s", min, next)

Here, "code" is the accumulated PIR instructions, "min" is an int
register containing the minimum number of repetitions, and "next"
is the string label to be branched to for executing the next
component of the rule.  In the old-style calling conventions, 
emit() was simply defined as:

    .sub "emit" method
         .param pmc code               # accumulating code object
         .param string out             # string to output
         .param string str1            # first %s substitution
         .param string str2            # second %s substitution
         .param string str3            # third %s substitution
         .param int int1               # first %d substitution
         # ...

Since int arguments (min) always went to the int register parameters
and string arguments (next) always went to the string register parameters,
all was fine regardless of the order they appeared in the argument list.  
I could tell how many of each was sent to emit by looking at the 
argcI and argcS pseudo-variables.

With the new PIR calling conventions, emit() is to be written
with ":optional" on the optional parameters, like so:

    .sub "emit" method
         .param pmc code               # accumulating code object
         .param string out             # string to output
         .param string str1 :optional  # first %s substitution
         .param string str2 :optional  # second %s substitution
         .param string str3 :optional  # third %s substitution
         .param int int1    :optional  # first %d substitution
         # ...

That's fine, but under the new calling conventions the original
calls to emit() no longer work, because for some reason PIR thinks
that any optional int arguments have to occur *after* any optional 
string arguments, thus requiring:

    emit(code, "if rep == %d goto %s", next, min)

This construction somewhat offends my sense of style,
because the "next" and "min" arguments are reversed from
the order of the %d and %s placeholders in the format string.
Also, it's not clear why PIR requires the integer parameter
to be last in the sequence -- after all, if it could figure out 
that "min" should go in the "int1" parameter (skipping "str2"
and "str3"), why couldn't it figure out the same sort of thing for 

    emit(code, "if rep == %d goto %s", min, next)

such that "min" goes into the first optional int, and "next" goes 
into the first optional string?  Clearly PIR isn't doing type
checking of optional parameters anyway; at most it's doing a
form "type matching".

But it would be really nice if I didn't have to worry about the 
register types at all.  That is in fact what I thought the new calling
conventions were intended to do -- free me from having to worry
about matching the register types between caller and callee.
(In Perl I don't have to worry about matching argument types --
it just converts values as needed based on context.)

I do notice that I can get this behavior if I use PMCs for the 
arguments of emit():

    .sub "emit" method
         .param pmc code               # accumulating code object
         .param string out             # string to output
         .param pmc sub1 :optional     # first %s substitution
         .param pmc sub2 :optional     # second %s substitution
         .param pmc sub3 :optional     # third %s substitution
         # ...

    # and later...
    emit(code, "if rep == %s goto %s", min, next)

Here, even though "min" is an int argument and "next" is a string
argument, they both get placed into the "sub1" and "sub2" pmc parameters
respectively, and I can just replace the first %s with the string
value of "sub1" and the second %s with the string value of "sub2".

Of course, the downside is that with this approach I'm creating PMCs 
just to get to a string value, and inside emit() I have to convert
each PMC into a string to be able to do various string operations
on the argument.  For arguments that were in string registers to
begin with, we've needlessly converted them into pmcs and back to
string registers again.

Things get really odd if we have a mix of optional pmc and other
parameters.  For example, consider the following:

    .sub "foo"                         .sub "bar"
         .param pmc abc                    .param pmc abc
         .param pmc sub1 :optional         .param string sub1 :optional
         .param pmc sub2 :optional         .param string sub2 :optional
         .param pmc sub3 :optional         .param string sub3 :optional
         .param int int1 :optional         .param string int1 :optional
         ...                               ...

    foo($P0, "hello", 0)               bar($P0, "hello", 0)

For the call to "foo", the arguments end up in the "sub1" and
"sub2" parameters, while in the call to "bar" the arguments end
up in the "sub1" and "int1" parameters.   Thus, in the call to "foo"
if the caller wants to get that 0 argument into the "int1" parameter
it has to be the fifth argument, while in the call to "bar" the zero
can be any argument as long as it's the first int.  So here, the
caller really needs to know the arguments used in the callee.

And I was a bit surprised when Leo mentioned on IRC that the 
following is illegal:

    .sub "baz"
        .param pmc abc
        .param string arg1 :optional
        .param int arg2    :optional
        .param string arg3 :optional          # illegal (?)
        
According to Leo, any optional parameters have to be grouped together
by register type (all strings together, all ints together, all pmcs
together, etc.).

So, optional parameters are not purely positional, and they
don't strictly match by register type.  Instead it seems to be a 
hybrid register-type-and-sequence pattern, where values are automatically 
converted if the next argument or parameter in the list is a pmc, 
but otherwise we skip any optional arguments until we find the 
group of parameter registers that match the type of argument 
we're currently at.

Personally, I don't think that having Parrot do register type checking
among only the I, S, and N registers is all that useful to me.
(There's no type checking for P registers -- they always convert.)  
And it's not immediately clear to me what advantage we're gaining by
automatically converting values to/from P registers but not the 
others.

I think it would be much more useful -- and result in cleaner PIR
code -- if all parameters were positional, with Parrot automatically 
converting values among the register types as needed, similar to
how Perl (5) handles arguments.  If type checking of the register 
arguments is needed, then perhaps that can be offered as a pragma, 
trace option, errorson flag, or some other option that tells parrot 
to not autoconvert the I/S/N registers but throw an exception instead.  
Or perhaps we go the other way, and have a pragma on subs that says to 
convert argument values positionally instead of tossing exceptions for
them.

I have no problem with doing whatever the Parrot design ultimately
requires; it's just that at the moment I don't seem to fully
understand the rationale or design goals of the latest convention.

Lastly, while on the topic of calling conventions, has there been 
any thought given as to a standard convention for named argument passing 
in Parrot subroutines?  There are a number of places in PGE where 
that would be really useful (and necessary...).  

Pm

Reply via email to