RFC 160 (v2) Function-call named parameters (with compiler optimizations)

Perl6 RFC Librarian Mon, 25 Sep 2000 12:22:52 -0700
This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Function-call named parameters (with compiler optimizations)

=head1 VERSION

  Maintainer: Michael Maraist <[EMAIL PROTECTED]>
  Date: 25 Aug 2000
  Last Modified: 25 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 160
  Version: 2
  Status: Developing

=head1 CHANGES

Finialized various features by removing many of the options( grealy simplified
the RFC).  Unified the goals with that of RFC 176 and RFC 273.

=head1 ABSTRACT

Function parameters and their positions can be ambiguous in
function-oriented programming.  Hashes offer tremendous help in this
realm, except that error checking can be very tedious. Also, hashes,
in general, take a performance hit.  

The goal is to enhance functionality / convinience / performance where
possible in regards to named-parameters, with a minimal of changes.
And, at the same time, allow this to be a completely optional and
virtually transparent process.  The following is an in-depth analysis
of various ways of accomplishing these goals.

=head1 DESCRIPTION

The current method of parameter proto-types only fulfills a tiny
niche, which is mainly to offer compile-type checking and to
disambiguate context ( as in sub foo($) { }, or sub foo(&$) { } ).
No support, however, is given to hashes, even though they are one of
perl's greatest strengths.  We see them pop up in parameterized
function calls all over the place (CGI, tk, SQL wrapper functions,
etc).  As above, however, it is left to the coder to check the
existance of required parameters, since in this realm, the current
proto-types are of no help.  It should not be much additional work to
provide an extension to prototypes that allow the definition of
hashes.

The following is a complex example of robust code:

 #/usr/bin/perl -w
 use strict 

 # IN: hash:
 #         a => '...' # req
 #         b => '...' # req, defined
 #         c => '...' # req, 0 <= c <= MAX_C
 #         d => '..'  # opt
 #         e => '..'  # opt
 #         f => '..'   # opt
 # OUT: xxx
 sub foo {
  my $self = shift;
  my %args = @_;

  # Requires $a
  my $a;
  die "No a provided"
     unless exists $args{a};
  $a = $args{a};

  # Requires non-null $b
  my $b;
  die "invalid b"
     unless exists $args{b} && defined ($b = $args{b});

  #  Requires non-null and bounded $c
  my $c;
  die "Invalid c"
     unless exists $args{c} && defined ($b = $args{b}) && ($c >= 0 && $c < $MAX_C);

  my ( $d, $e, $f ) = @args{ qw( d e f ) };
  ...
 } # end foo

Becomes:

 sub foo($%) : method required_fields(a b c) fields(d e f) doc(<<EOS) { 
 # IN: hash:
 #         a => '...' # req;  Do some A
 #         b => '...' # req, defined; Do some B
 #         c => '...' # req, 0 <= c <= MAX_C; Do some C
 #         d => '..'  # opt; Do some D
 #         e => '..'  # opt; Do some E
 #         f => '..'   # opt; Do some F
 # OUT: xxx
 EOS
   my $self = shift;
   my %args : fields(a b c d e f) = @_; # produce optimized hash
    that is already pre-allocated at compile-time.

   # Requires non-null $args{b}
   die "invalid b" 
      unless defined $args{b};

   # Requires non-null and bounded $args{c}
   die "invalid c"
      unless defined $args{c} && ($args{c} >= 0 && $args{c} < $MAX_C);

   ...
 } # end foo

 $obj->foo( c => 3, b => 2, f=> 8, a => 1 );
 # Note the out-of order, and the mixture of optional fields

 foo( $obj, a => 1, b => 2, c => 3 ); # still totally legal
 foo( a => 1, b => 2 ); # compiler-error (invalid num-args)
 foo( 1,2,3,4,5,6,7); # compiler-error, missing args a, b and c
 foo(a,1,b,2,c,3,$obj); # compiler-error, missing args a, b and c 
 # (since they're offset by one)
 my @args = ( a => 1, b => 2, c => 3);
 $obj->foo( @args ); # checking-deffered to run-time.  Will be ok.
 my @bad_args = ( b => 8, e => 4 );
 $obj->foo( @bad_args ); # checking-deffered to run-time.  Will fail.

Essentially, perl's compiler can be put to use for hashed-function
calls in much the same way as pseudo hashes work for structs/objects.
Making this a compile-time check would drastically reduce run-time
errors in code (that used hash-based parameters).  It would also make
the code both more readible AND more efficient.

For readibility, perl can be quiried for the list of allowable options as
well as general documentation.  In the above, the listing of Input options
would have been redundant, for both the code-reader, and the run-time query,
but was provided for completeness.

Note also that the above is compatible with the existing structure.  In fact,
foo required the old-style prototype to distinguish the "self" variable from
the general-hash arguments.  The use of the attribute "method" was optional, and
could be used in the auto-generation of a $SELF variable.  At the very least, it
allows a run-time description of what the first argument really-is.

An important thing to note is that we're not changing the functionality of execution.
Perl sub's still look and feel like old-style subs to the user.  They simply act as if
additional run-time checking has occured.  The only physical difference is that, where
possible, you will find compile-time errors.

In the case that a static parameter list is provided (no dynamically expanded array's /
hashes), and the strict-hash method is used (see RFC 273), then a compile-time 
reordering can occur, which has the effect of an array copy of values instead of
the generation of a hash.  So long as the subroutine never makes use of dynamic field
lookup, a hash is never used (except for a behind-the-schenes mapping of dynamic
parameters).

This helps large-scale functions, such as TK / CGI / etc, where there are dozens of
optional parameters that should all be called explicitly in the code.

Proposed attributes for subroutines are as follows:

=over 4

=item locked 

In multi threading, locks the function unless is also declared as a
method, in which the object is instead locked.

=item method

Compatibility, used in conjunction with 'lock' to perform object
locking in multi-threading.  Also allows enhanced documentation and description
of the first argument.

=item fields( keyA keyB keyC ... )

Describes the pseudo-hash / strict-hash allowable parameters (in conjunction with
other attributes).  Hash keys that do not match this list will generate errors
(at compile time if function-calls are static lists, or at run-time, if expanded
array's/ hashes are used).  The advantage of this approach over using qs-structures
or through redefinition of proto-types (as in "sub foo($a, $b, $c) {") is that
we maintain back-ward compatibility, by providing a logical SUPERSET of attribute
information on subroutine.  This superset can additionally be extended (as in the
use of required_fields).  It also allows the field list to be visible at the prototype.

key-names may have an optional data-type prefix, such as $, %, %, &, * that enforce
the data-type at compile/runtime.  They would not be part of the key-name.

=item required_fields( keyA keyB keyC )

Works in conjunction with fields to define the list of allowable set of hash keys.
This set defines required fields while the "fields" set defines optional ones.
To reduce confusion, and errors, these two sets may over-lap.  The total list of
optional fields is the union of all field-defining attributes.

key-names could have prefixes as in "fields" above.

=back

As above, The allowable fields is the union of fields(..) and required_fields(...).
The text representing "..." is treated just like qw(...) as would be use vars qw(...).
This method allows extensions that could more finely define data-types.  For example,
it is possible (though not currently proposed) that these attribute lists could do the
following:

 sub foo : required_fields( $cnt @list %data ) {
   my %args: fields(cnt list data) = @_;
   print "Cnt = $args{cnt}\n";   # cnt garunteed to be a scalar
   for my $item ( @{$args{list}} ) { ... } # list garunteed to be an array ref
   while( my ( $key, $val ) = %{$args{data}} ) { ... }
     # data garunteed to be a hash-ref
 }

 my %data = ( a => 1, b => 2 );
 foo( cnt => 5, list => [ 1, 2, 3 ], data => \%data );

The lack of any prefix would mean that any data-type would be allowable.

=head1 IMPLEMENTATION

The compilation and run-time code would have to be augmented to handle the
restrictive field-attributes.

=over 4

=item Compile Time

When a new sub statement is detected with fields or required_fields, a strict-hash
structure will have to be internally defined for it (see RFC 273 for details).  
If perl is not
in strict mode, then it's possible that lazy execution of functions may have been 
compiled prior to reading the proto-type.  In the interests of efficiency, they will
simply have to be deferred to run-time checking (through the setting of flags).

For all subsequent invocations of the subroutine, traditional proto-type checking
will be augmented with parameter-checking so long as no expansions of array's / hashes
are detected.  This will involve checking the existence of key-names.

If data-type checking is implemented (in perl), and the subroutine has elected to use 
it,
then static values (of the parameter hash) will be checked for proper data-types.
Any use of variables as values (which is often the case) will defer additional checking
to run-time.

The subroutine-call will be flagged as having been compiler-checked for parameter
presence, and optionally data-type correctness.  In order to alleviate complexity,
if data-type correctness is required, and _any_ parameters are dynamic, then the 
sub-call
will be flagged for run-time data-type-correctness-checking.

Optionally, the compilation stage may try and detect usage of a strict-hash for the 
hash-parameters.  As in:

 sub foo($$$%) : required_fields(a b) {
  my ( $a, $b, $c );
  my %hash : fields(a b);
  ( $a, $b, $c, %hash ) = @_;
  ...
 }

In this case, it might be possible to optimize the last 4 array possitions of @_ in
such a way that the hash assignment at the very least by-passes additional run-time
checking of fields, and at the most, performs a direct array-copy of data as in:
  my %hash : fields(a b);
  if ( COMPILER_OPTIMIZED && NO_DYNAMIC_FIELDS ) {
   @hash{qw(a b)} = $_[4, 5]; #which internally is @hash[0, 1] = $_[4, 5]
  } else {
   %hash = @_[ 3 .. $#_ ];
  }

This assumes that the sub-call was optimized by re-ordering the fields.  Obviously this
can only work on required_fields.  Additionally, reordering of fields would be 
expensive
in a run-time environment.  Additionally, this adds complexity when dealing with 
shift's
of @_, but this can all be determined at compile-time.  Again, this is an optimization
and thus optional.

=item Run Time

Once a subroutine is called, it's arguments are expanded and placed on the call stack
as has occured historically.  Part of those arguments is a hidden field that describes
how much compile-time optimization has occured.  This may take the form of a different
type of sub-call op-code (ideal case), or in the form of a hidden stack-parameter.  In
either case, several things are checked for.

If the call was optimized into an array, then no additional computation will occur, @_
will be passed as is.  One possible optimization at this point would be to assign
the trailing hash-parameters to the strict-args-hash directly as above (since the 
strict 
hash is really an array with hash-like-access).  This is the ideal case.

If dynamic values were used in the called hash, perl will have to internally check the
data-types for validity if requested (this could be expensive, but if the user really
needed it, 
they'd have to do it in even more expensive perl-code).  I suggest that we should not
punish good programming practices by adding extra typing or massive performance 
penalties.

If the key-names were dynamically expanded, then perl will internally check key-names
for the set of required and allowable.  The above check for value-data-types is then
optionally performed.

=back

=head1 SUMMARY

In summary, in keeping with perl's spirit, we should definately not
enforce a new function / method invocation process; not even for
hash-based / named parameters.  Also, it makes little sence to produce
an entirely new syntax for one or two special cases, which only obtain
performance benifits under certain conditions.  This would needlessly
produce legacy code which would be difficult to maintain in the
future.

This RFC suggests a compatible method of named-parameters through the
use of an optional compiler-level hash.  Initial implementations could
all be applied as a form of pre-processor.  Subsequent versions could
internally optimize various special cases.  The use of attributes is a logical
extension, since we first define the core of an entity, then extend the description
for the purposes of optimization and narrowing of correctness of code.

This RFC made heavy reference to the proposed built-in-pseudo-hash or strict-hash 
described in RFC 273, but the beauty is that that just about everything is independant
and thus optional.  The following is the minimum defined by this RFC:

 sub foo : fields( a b c) {
  # uses @_ just as before
 }

 foo( @args ); # so long as args defines the hash ( a=> 1, b => 2, c => 3 )

The following is the ideal case (allows all possible optimizations:

 sub foo($%) : fields( $a %b @c ) required_fields( $a ) doc("useful stuff") {
   # uses @_ just as before
   my $scalar = shift;
   my %args : fields ( a b c ) = @_;
 }

 foo( $scalar, a => 5, b => { .. }, c => [.. ] );


=head1 REFERENCES

Thread

man perlref: pseudo-hashes

RFC 273: Internal representation of Pseudo-hashes using attributes.

RFC 176: subroutine / generic entity documentation

RFC 57: Subroutine prototypes and parameters

RFC 75: structures and interface definitions

RFC 152: Replace invocant in @_ with self() builtin
RFC 160 (v2) Function-call named parameters (with compiler optimizations)

Reply via email to