RFC 160 (v1) Function-call named parameters (with compiler optimizations)

Perl6 RFC Librarian Fri, 25 Aug 2000 08:21:35 -0700
This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Function-call named parameters (with compiler optimizations)

=head1 VERSION

  Maintainer: Michael Maraist <[EMAIL PROTECTED]>
  Date: 25 Aug 2000
  Mailing List: [EMAIL PROTECTED]
  Version: 1
  Number: 160

=head1 ABSTRACT

Function parameters and their positions can be ambiguous in
function-oriented programming.  Hashes offer tremendous help in this
realm, except that error checking can be very tedious. Also, hashes,
in general, take a performance hit.  Thus either of the following
syntaxes are suggested:

 sub foo<a, b, c> {
    return $a + $b * $c;
 } # end foo

 $res = foo( b => 5, c => 8, a => 9 );

OR

 sub foo: params_req_defined(a, b, c) {
   reuturn $a + $b * $c;
 } # end foo

 $res = foo( b => 5, c => 8, a => 9 );

The goal is to enhance functionality / convinience / performance where
possible in regards to named-parameters, with a minimal of changes.
And, at the same time, allow this to be a completely optional and
virtually transparent process.  The following is an in-depth analysis
of various ways of accomplishing these goals.

=head1 DESCRIPTION

The current method of parameter proto-types only fulfills a tiny
niche, which is mainly to offer compile-type checking and to
disambiguate context ( as in sub foo($) { }, or sub foo(&$) { } ).
No support, however, is given to hashes, even though they are one of
perl's greatest strengths.  We see them pop up in parameterized
function calls all over the place (CGI, tk, SQL wrapper functions,
etc).  As above, however, it is left to the coder to check the
existance of required parameters, since in this realm, the current
proto-types are of no help.  It should not be much additional work to
provide an extension to prototypes that allow the definition of
hashes.

The following is a complex example of robust code:

 #/usr/bin/perl -w
 use strict 

 # IN: hash:
 #         a => '...' # req
 #         b => '...' # req, defined
 #         c => '...' # req, 0 <= c <= MAX_C
 #         d => '..'  # opt
 #         e => '..'  # opt
 #         f => '..'   # opt
 # OUT: xxx
 sub foo {
  my %args = @_;

  # Requires $a
  my $a;
  die "No a provided"
     unless exists $args{a};
  $a = $args{a};

  # Requires non-null $b
  my $b;
  die "invalid b"
     unless exists $args{b} && defined ($b = $args{b});

  #  Requires non-null and bounded $c
  my $c;
  die "Invalid c"
     unless exists $args{c} && defined ($b = $args{b}) && ($c >= 0 && $c < $MAX_C);

  my ( $d, $e, $f ) = @args{ qw( d e f ) };
  ...
 } # end foo

Becomes:

 # IN: ...
 # OUT: ...
 sub foo<a, b, c, d, e, f> {
   # Implicitly defines and assigns my $a through $f

   # Requires non-null $b
   die "invalid b" 
      unless defined $b;

   # Requires non-null and bounded $c
   die "invalid c"
      unless defined $c && ($c >= 0 && $c < $MAX_C);

   ...
 } # end foo

Essentially, perl's compiler can be put to use for hashed-function
calls in much the same way as pseudo hashes work for structs/objects.
Making this a compile-time check would drastically reduce run-time
errors in code (that used hash-based parameters).  It would also make
the code both more readible AND more efficient.

There are several ways this could go.  The least obtrusive would be
the above;  No errors are generated by the compiler.  This simply
serves as an aid to the programmer to alleviate the need for all those
"exists" and %args fetches (which is necessary to avoid "use of
undefined" warnings, which are very helpful in large-scale code).
This model, however, is obviously limited in it's usefulness.  One
could almost write a pre-processor to perform this activity.

At the other extreme, the use of <arg-list> could require these, and
only these fields.  In this manner, the compiler could easily convert
the hash into a fixed parameter listing in a manner similar to the
following:

 sub foo<a, b, c> {
    ...
 }

 foo( c => 1, b => 2, a => 3 );
 foo( 8, 9, 10 );
 for( %myhash );

Translates to:

 sub foo($$$) {
   my ( $a, $b, $c ) = @_;
   ...
 }

 foo( 3, 2, 1 );
 foo( 8, 9, 10 );
 foo( @myhash{ 'a', 'b', 'c' } );

To my knowledge, the GNU C compiler does this sort of parameter
reorganizing.  This is also similar to the way functions are passed in
python.. Before people make the comment "you know where to find C and
python", you can't tell me that this doesn't take out not-fun parts of
coding by helping the developer (and maintainer through the
proliferation of named parameters).  For non trivial function calls,
it is a great benifit to the maintainer to understand what each
parameter is by simply looking at the function call.

An obvious limitation is in the treatment of @_ within the function.
Either the function assumes that it was defined without hash'd
parameters, or the following construct would be needed:

 sub foo($$$$$$) {
   my ( $a, $b, $c ) = @_[ 1, 3, 5 ]; # Minor performance penalty
   ...
 }
 foo( a, 3, b, 2, c, 1 );  # Simple compiler-time reordering
 foo( a, 8, b, 9, c, 10 ); # reverses the trend above
 foo( map { ( $_, $myhash{ $_ } ) } qw( a b c ) );  # Obviously undesirable

I would suggest the former approach, which does actually limit a
special class of function calls ( which I refer to hereafter as
chained function calls), where a developer may only be applying a
wrapper to a deeper function.  In this case, the wrapper function will
want to only examine one or two parameters, passing [optionally]
everything to it's wrapped function.  Essentially, the above would
require explicit naming of all parameters instead of just passing @_
or by making use of perl's "&func;" method which optimally passes the
caller's stack up.  A mere inconvinience at best though.

Another obvious problem with forced named-positions is with
hetero-genious arguments.  First, and foremost is the mixing with
class method invocation:

 $obj->foo( a => 1, b => 2 );

Also with Function parameters:

 sub my_cmp(&@) { }
 my_cmp { $_[0] < $_[1] } 5, 6;

Likewise, there are entire classes of functions that have scalars
intermingled with hashes.  Though objects could be taken as a special
case, and the use of explicitly-named-parameters could be optional, I
feel more could be done.

The simplest Hybrid could mingle the two syntaxes:

 sub foo($%) <a,b.c> {
   # Here @_[ 1 .. $#_ ] are handled prior to function call
   my $self = shift;
   ...
 } # end foo

 sub my_cmp(&$$%) <a,b,c> {
   # Here @_[ 3 .. $#_ ] are handled prior to function call
   my ( $sub, $lcmp, $rcmp ) = @_;
   ...
 } # end my_cmp

Another possible Hybrid could make use of function-attributes:

 sub foo <self,a,b,c> : method, method_self, params_fixed {
    $self->{a} = $a;
    $self->{sum} = $b + $c;
    return $self;
 } # end foo

Here, the fixed_params would allow only the named parameters, and
would be a candidate for compiler optimization.  The use of $self in
this fashion is a seperate discussion.  The name method_self was a
simple fix for backward compatibility.  Method attributes of this type
could include:

=over 4

=item locked 

In multi threading, locks the function unless is also declared as a
method, in which the object is instead locked.

=item method

Compatibility, used in conjunction with 'lock' to perform object
locking in multi-threading.

=item method_self

Specifies that a self-object reference should be implicitly created
based apon the context.  Other attributes determine exactly how the
lexical variable is generated.  For most cases, this is equivalent to
prepending the function with:

 my $self = shift;

=item params_relaxed

Provides no enforcement of parameters, nor any real compiler
optimizations.  It serves simply to gauruntee the generation of the
lexical variables (and assignments from the passed hash if present),
while at the same time, explicitly defining the function (for
potential use in a sort of dynamic "reflections" function-attribute
query.).  Extra passed parameters are ignored, and missing ones
produce undef's.  

This would typically be equivalent to:

 sub foo {
   my ( $a, $b, $c );
   {
     no warn;
     my %args = @_;
     $a = $args{ a };
     $b = $args{ b };
     $c = $args{ c };
   }
   ...
 }
 
=item params_min

Same as fixed, except that extra parameters are ignored.  This
accomodates chaining function calls, where each function will pick and
choose their own parameters, and pass the rest down the chain.
Compiler optimizations for this might be difficult (if at all
possible).  Perhaps something like the following could work:

 sub foo<a, b>: params_min { ... }
 foo( b => 1, a => 2, c => 3 );

Translates to:
 
 sub foo( $$$$;@) {
  my ( $a, $b );
  Internal-if:  if ( called_statically ) {
     ( $a, $b ) = @_[ 1, 3 ];
  } else {  # Called with a dynamic hash
     no warn;
     my %args = @_;
     ( $a, $b )  = @args{ 'a', 'b' };
  }
  ...
 }

 foo ( a, 2, b, 1, c, 3 );

Obviously, this adds complexity, plus additional information has to be
passed to the function to determine if an optimization may occur or
not.  Sadly, this might even require adding information to the
caller() function to fullfill the if statement.  This should, however
be able to be handled under the covers, optimization or no.

=item params_defined

Works in conjunction with min / fixed and adds the additional
constraint that no fields can be undef.  This would actually be less
optimized than the fixed case since it's code would become:

 sub foo($$$) {
   die "undefined parameter" unless defined $_ foreach @_;
   my ( $a, $b, $c ) = @_;
   ...
 } # end foo

And the params_min becomes:

 sub foo($$;@) {
   my ( $a, $b );
   Internal-if:  if ( called_statically ) {
      ( $a, $b ) = @_[ 1, 3 ];
   } else {  # Called with a dynamic hash
      no warn;
      my %args = @_;
      ( $a, $b )  = @args{ 'a', 'b' };
   }
   die "undefined parameter" unless defined $_ foreach ( $a, $b );
   ...
 } # end foo

=back

=head1 IMPLEMENTATION

Various possible implementations are the following:

=over 4

=item sub foo(a, b, c) { ... }

Still compatible with ($$$) due to the presence of \w characters, but
you can't intermix the old and new style of proto-types in the same
function call.

=item sub foo<a, b, c> { ... }

Currently proposed method.  It looks aqward, but allows mixing
proto-type styles in case there are still needs for (&@) <a, b, c>,
and the like.

=item sub foo [ < ( ] $a, $b, $c [ < ) ] { ... }

This just adds '$'s to either style above, which is a matter of taste.
All hash values are scalar, so why should we have to prefix their
counter-parts with '$'.  The only answer I can figure is for
readibility; It stands out more this way, and is more consistent with
the rest of the language.  More to type in my opinion.

=item sub foo: params(a,b,c) { ... }

Which makes good use of attribute fields, except that it just looks a
little odd ( not that <..> doesn't ).

=item sub foo: method_self, params_req(a,b), params_req_defined(c,d), params_opt(e), 
params_extra { ... }

This style makes use of four optional function-attributes, which can
be applied in any combination, so long as their parameter-names are
mutually exclusive. (again method_self is an independant issue, but is
suggested since mixing hashes and objects can be common).

=over 4

=item method_self

As with the above, this is a suggested optional enhancement which
takes care of named parameters in OO design, by implicitly defining a
$self variable.

=item params_req(a,b,c)

used to specify which parameters are fixed.  If this is the only
parameter of the three, then full optimizations can occur.

=item params_req_defined(a,b,c)

same as params_req, but requires fields to be defined as well.

=item params_opt(a,b,c)

used to specify optional parameters.  Non passed parameters become
undef by default. (Obviously there can be no params_op_defined equiv.
)

=item params_extra

Allows @_ to contain additional fields.  This negates optimizations
for req-fields, since the user may have other plans for @_.

=back

=back

=head1 SUMMARY

In summary, in keeping with perl's spirit, we should definately not
enforce a new function / method invocation process; not even for
hash-based / named parameters.  Also, it makes little sence to produce
an entirely new syntax for one or two special cases, which only obtain
performance benifits under certain conditions.  This would needlessly
produce legacy code which would be difficult to maintain in the
future.

This RFC suggests a compatible method of named-parameters through the
use of an optional compiler-level hash.  Initial implementations could
all be applied as a form of pre-processor.  Subsequent versions could
internally optimize various special cases.

=head1 REFERENCES

Thread

pseudo-hashes
RFC 160 (v1) Function-call named parameters (with compiler optimizations)

Reply via email to