First, my thanks to Leo and Chip for their ongoing work to the Parrot calling conventions. I'm looking forward to the outcome.
As of the leo-ctx5 branch, however, I find I'm confused about what the new conventions are supposed to achieve, and indeed even how they are going to work when they're finished. So, this is going to be a longish message with my initial confusions and observations about the new calling conventions, in the hopes of clearing things up a bit. First, let me provide some context -- in my career programmed in a lot of different languages and environments, including Perl, C, shells, awk, C++, Java, PHP, and Pascal, as well as 68000, 6809, Z-80, IBM/360, and VAX-11 assembly languages. So, in learning the new Parrot calling conventions I find I'm trying to map them into something I'm familiar with, and having a lot of trouble doing that. Perhaps Parrot just doesn't fit well into any of those models. Also, I've been working under the assumption that it's generally better (faster/more efficient) for frequently used integer and string values to be held and manipulated in I and S registers rather than PMCs. Perhaps this assumption is invalid. Indeed, it's possible that I'm working under a number of incorrect assumptions about how to do things in Parrot, so if anyone can think of a better way to approach the things that PGE is trying to do, please feel free to speak up. With that background out of the way; let me give some examples of confusion and/or difficulties I'm encountering in moving PGE to the new calling conventions as implemented in leo-ctx5 (and as discussed today on #parrot). My questions and comments below are formed from today's experience and conversations; it's entirely possible I'm missing the bigger picture, or that what I'm finding is all based on a misconception somewhere. First, from reading pdd03 I was under the impression that parameter passing was going to end up purely positional -- that each argument would automatically be converted into the type required by the corresponding target register. This is the way I interpreted the words "standard conversion" in pdd03: =head3 Type Conversions Unlike the C<set_*> opcodes, the C<get_*> opcodes must perform conversion from one register type to another. Here are the conversion rules: * When the target is an I, N, or S register, storage will behave like an C<assign> (standard conversion). However, according to Leo's journal, the I, N, and S registers do not convert among themselves but instead perform type checking and throw an exception on mismatches. (cf. "Type checking" at http://use.perl.org/~leo/journal/25491.) For subroutines that have fixed parameter lists, perhaps this is desirable. (Note, however, that I come from a Perl background, where autoconversion of values is the norm.) It does give me pause that PIR will autoconvert arguments to and from PMCs but not other registers -- i.e.: .sub "foo" .sub "bar" .param pmc arg1 .param string arg1 print arg1 print arg1 .end .end foo(1) # valid bar(1) # type mismatch I suppose if we consider pmcs to be a "universal type" and registers to be "restricted types" this makes sense, but in either case above I have to write extra statements somewhere to get an integer argument into a string register in the subroutine. But my real confusion comes from the handling of "optional" parameters in subroutine calls. PGE uses a number of subroutine calls with optional parameters -- I think that this results in code that is more readable and maintainable, and also can avoid any overhead from setting up and passing lots of "dummy" arguments that aren't going to be used in a particular subroutine invocation anyway. One of the PGE functions that has trouble with the new calling conventions is emit(), which is used to generate the PIR instructions corresponding to a rule being compiled. emit() was designed to look and act a lot like C's sprintf(), in that it uses %d and %s as placeholders for values to be substituted in a string on output: emit(code, "if rep == %d goto %s", min, next) Here, "code" is the accumulated PIR instructions, "min" is an int register containing the minimum number of repetitions, and "next" is the string label to be branched to for executing the next component of the rule. In the old-style calling conventions, emit() was simply defined as: .sub "emit" method .param pmc code # accumulating code object .param string out # string to output .param string str1 # first %s substitution .param string str2 # second %s substitution .param string str3 # third %s substitution .param int int1 # first %d substitution # ... Since int arguments (min) always went to the int register parameters and string arguments (next) always went to the string register parameters, all was fine regardless of the order they appeared in the argument list. I could tell how many of each was sent to emit by looking at the argcI and argcS pseudo-variables. With the new PIR calling conventions, emit() is to be written with ":optional" on the optional parameters, like so: .sub "emit" method .param pmc code # accumulating code object .param string out # string to output .param string str1 :optional # first %s substitution .param string str2 :optional # second %s substitution .param string str3 :optional # third %s substitution .param int int1 :optional # first %d substitution # ... That's fine, but under the new calling conventions the original calls to emit() no longer work, because for some reason PIR thinks that any optional int arguments have to occur *after* any optional string arguments, thus requiring: emit(code, "if rep == %d goto %s", next, min) This construction somewhat offends my sense of style, because the "next" and "min" arguments are reversed from the order of the %d and %s placeholders in the format string. Also, it's not clear why PIR requires the integer parameter to be last in the sequence -- after all, if it could figure out that "min" should go in the "int1" parameter (skipping "str2" and "str3"), why couldn't it figure out the same sort of thing for emit(code, "if rep == %d goto %s", min, next) such that "min" goes into the first optional int, and "next" goes into the first optional string? Clearly PIR isn't doing type checking of optional parameters anyway; at most it's doing a form "type matching". But it would be really nice if I didn't have to worry about the register types at all. That is in fact what I thought the new calling conventions were intended to do -- free me from having to worry about matching the register types between caller and callee. (In Perl I don't have to worry about matching argument types -- it just converts values as needed based on context.) I do notice that I can get this behavior if I use PMCs for the arguments of emit(): .sub "emit" method .param pmc code # accumulating code object .param string out # string to output .param pmc sub1 :optional # first %s substitution .param pmc sub2 :optional # second %s substitution .param pmc sub3 :optional # third %s substitution # ... # and later... emit(code, "if rep == %s goto %s", min, next) Here, even though "min" is an int argument and "next" is a string argument, they both get placed into the "sub1" and "sub2" pmc parameters respectively, and I can just replace the first %s with the string value of "sub1" and the second %s with the string value of "sub2". Of course, the downside is that with this approach I'm creating PMCs just to get to a string value, and inside emit() I have to convert each PMC into a string to be able to do various string operations on the argument. For arguments that were in string registers to begin with, we've needlessly converted them into pmcs and back to string registers again. Things get really odd if we have a mix of optional pmc and other parameters. For example, consider the following: .sub "foo" .sub "bar" .param pmc abc .param pmc abc .param pmc sub1 :optional .param string sub1 :optional .param pmc sub2 :optional .param string sub2 :optional .param pmc sub3 :optional .param string sub3 :optional .param int int1 :optional .param string int1 :optional ... ... foo($P0, "hello", 0) bar($P0, "hello", 0) For the call to "foo", the arguments end up in the "sub1" and "sub2" parameters, while in the call to "bar" the arguments end up in the "sub1" and "int1" parameters. Thus, in the call to "foo" if the caller wants to get that 0 argument into the "int1" parameter it has to be the fifth argument, while in the call to "bar" the zero can be any argument as long as it's the first int. So here, the caller really needs to know the arguments used in the callee. And I was a bit surprised when Leo mentioned on IRC that the following is illegal: .sub "baz" .param pmc abc .param string arg1 :optional .param int arg2 :optional .param string arg3 :optional # illegal (?) According to Leo, any optional parameters have to be grouped together by register type (all strings together, all ints together, all pmcs together, etc.). So, optional parameters are not purely positional, and they don't strictly match by register type. Instead it seems to be a hybrid register-type-and-sequence pattern, where values are automatically converted if the next argument or parameter in the list is a pmc, but otherwise we skip any optional arguments until we find the group of parameter registers that match the type of argument we're currently at. Personally, I don't think that having Parrot do register type checking among only the I, S, and N registers is all that useful to me. (There's no type checking for P registers -- they always convert.) And it's not immediately clear to me what advantage we're gaining by automatically converting values to/from P registers but not the others. I think it would be much more useful -- and result in cleaner PIR code -- if all parameters were positional, with Parrot automatically converting values among the register types as needed, similar to how Perl (5) handles arguments. If type checking of the register arguments is needed, then perhaps that can be offered as a pragma, trace option, errorson flag, or some other option that tells parrot to not autoconvert the I/S/N registers but throw an exception instead. Or perhaps we go the other way, and have a pragma on subs that says to convert argument values positionally instead of tossing exceptions for them. I have no problem with doing whatever the Parrot design ultimately requires; it's just that at the moment I don't seem to fully understand the rationale or design goals of the latest convention. Lastly, while on the topic of calling conventions, has there been any thought given as to a standard convention for named argument passing in Parrot subroutines? There are a number of places in PGE where that would be really useful (and necessary...). Pm