RFC 160 (v3) Function-call named parameters (with compiler optimizations)

Perl6 RFC Librarian Sat, 30 Sep 2000 12:47:10 -0700
This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Function-call named parameters (with compiler optimizations)

=head1 VERSION

  Maintainer: Michael Maraist <[EMAIL PROTECTED]>
  Date: 25 Aug 2000
  Last Modified: 30 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 160
  Version: 3
  Status: Frozen

=head1 CHANGES

Finialized various features by removing many of the options( grealy
simplified the RFC).  Unified the goals with that of RFC 176 and RFC 273.

=head1 ABSTRACT

Function parameters and their positions can be ambiguous in
function-oriented programming.  Hashes offer tremendous help in this
realm, except that error checking can be very tedious. Also, hashes,
in general, take a performance hit.

The goal is to enhance functionality / convinience / performance where
possible in regards to named-parameters, with a minimal of changes.
And, at the same time, allow this to be a completely optional and
virtually transparent process.  The following is an in-depth analysis
of various ways of accomplishing these goals.

=head1 DESCRIPTION

The current method of parameter proto-types only fulfills a tiny
niche, which is mainly to offer compile-type checking and to
disambiguate context ( as in sub foo($) { }, or sub foo(&$) { } ).
No support, however, is given to hashes, even though they are one of
perl's greatest strengths.  We see them pop up in parameterized
function calls all over the place (CGI, tk, SQL wrapper functions,
etc).  As above, however, it is left to the coder to check the
existance of required parameters, since in this realm, the current
proto-types are of no help.  It should not be much additional work to
provide an extension to prototypes that allow the definition of
hashes.

The following is a complex example of robust code:

 #/usr/bin/perl -w
 use strict

 # IN: hash:
 #         a => '...' # req
 #         b => '...' # req, defined
 #         c => '...' # req, 0 <= c <= MAX_C
 #         d => '..'  # opt
 #         e => '..'  # opt
 #         f => '..'   # opt
 # OUT: xxx
 sub foo {
  my $self = shift;
  my %args = @_;

  # Requires $a
  my $a;
  die "No a provided"
     unless exists $args{a};
  $a = $args{a};

  # Requires non-null $b
  my $b;
  die "invalid b"
     unless exists $args{b} && defined ($b = $args{b});

  #  Requires non-null and bounded $c
  my $c;
  die "Invalid c"
     unless exists $args{c} && defined ($b = $args{b}) && ($c >= 0 && $c <
$MAX_C);

  my ( $d, $e, $f ) = @args{ qw( d e f ) };
  ...
 } # end foo

Becomes:

 sub foo($%) : method required_fields(a b c) fields(d e f) doc(<<EOS) {
 # IN: hash:
 #         a => '...' # req;  Do some A
 #         b => '...' # req, defined; Do some B
 #         c => '...' # req, 0 <= c <= MAX_C; Do some C
 #         d => '..'  # opt; Do some D
 #         e => '..'  # opt; Do some E
 #         f => '..'   # opt; Do some F
 # OUT: xxx
 EOS
   my $self = shift;
   my %args : fields(a b c d e f) = @_; # produce optimized hash
    that is already pre-allocated at compile-time.

   # Requires non-null $args{b}
   die "invalid b"
      unless defined $args{b};

   # Requires non-null and bounded $args{c}
   die "invalid c"
      unless defined $args{c} && ($args{c} >= 0 && $args{c} < $MAX_C);

   ...
 } # end foo

 $obj->foo( c => 3, b => 2, f=> 8, a => 1 );
 # Note the out-of order, and the mixture of optional fields

 foo( $obj, a => 1, b => 2, c => 3 ); # still totally legal
 foo( a => 1, b => 2 ); # compiler-error (invalid num-args)
 foo( 1,2,3,4,5,6,7); # compiler-error, missing args a, b and c
 foo(a,1,b,2,c,3,$obj); # compiler-error, missing args a, b and c
 # (since they're offset by one)
 my @args = ( a => 1, b => 2, c => 3);
 $obj->foo( @args ); # checking-deffered to run-time.  Will be ok.
 my @bad_args = ( b => 8, e => 4 );
 $obj->foo( @bad_args ); # checking-deffered to run-time.  Will fail.

Essentially, perl's compiler can be put to use for hashed-function
calls in much the same way as pseudo hashes work for structs/objects.
Making this a compile-time check would drastically reduce run-time errors
in code (that used hash-based parameters).  It would also make the code
both more readible AND more efficient.

For readibility, perl can be quiried for the list of allowable options as well
as general documentation.  In the above, the listing of Input options would
have been redundant, for both the code-reader, and the run-time query, but was
provided for completeness.

Note also that the above is compatible with the existing structure.  In fact,
foo required the old-style prototype to distinguish the "self" variable from
the general-hash arguments.  The use of the attribute "method" was optional,
and could be used in the auto-generation of a $SELF variable.  At the very
least, it allows a run-time description of what the first argument really-is.

An important thing to note is that we're not changing the functionality of
execution.  Perl sub's still look and feel like old-style subs to the user.
They simply act as if additional run-time checking has occured.  The only
physical difference is that, where possible, you will find compile-time errors.

In the case that a static parameter list is provided (no dynamically expanded
array's / hashes), and the strict-hash method is used (see RFC 273), then a
compile-time reordering can occur, which has the effect of an array copy of
values instead of the generation of a hash.  So long as the subroutine never
makes use of dynamic field lookup, a hash is never used (except for a
behind-the-schenes mapping of dynamic parameters).

This helps large-scale functions, such as TK / CGI / etc, where there are
dozens of optional parameters that should all be called explicitly in the code.

Proposed attributes for subroutines are as follows:

=over 4

=item locked

In multi threading, locks the function unless is also declared as a
method, in which the object is instead locked.

=item method

Compatibility, used in conjunction with 'lock' to perform object locking in
multi-threading.  Also allows enhanced documentation and description of the
first argument.

=item fields( keyA keyB keyC ... )

Describes the pseudo-hash / strict-hash allowable parameters (in conjunction
with other attributes).  Hash keys that do not match this list will generate
errors (at compile time if function-calls are static lists, or at run-time, if
expanded array's/ hashes are used).  The advantage of this approach over using
qs-structures or through redefinition of proto-types (as in "sub foo($a, $b,
$c) {") is that we maintain back-ward compatibility, by providing a logical
SUPERSET of attribute information on subroutine.  This superset can
additionally be extended (as in the use of required_fields).  It also allows
the field list to be visible at the prototype.

key-names may have an optional data-type prefix, such as $, %, %, &, * that
enforce the data-type at compile/runtime.  They would not be part of the
key-name.

=item required_fields( keyA keyB keyC )

Works in conjunction with fields to define the list of allowable set of hash
keys.  This set defines required fields while the "fields" set defines optional
ones.  To reduce confusion, and errors, these two sets may over-lap.  The total
list of optional fields is the union of all field-defining attributes.

key-names could have prefixes as in "fields" above.

=back

As above, The allowable fields is the union of fields(..) and
required_fields(...).  The text representing "..." is treated just like qw(...)
as would be use vars qw(...).  This method allows extensions that could more
finely define data-types.  For example, it is possible (though not currently
proposed) that these attribute lists could do the following:

 sub foo : required_fields( $cnt @list %data ) {
   my %args: fields(cnt list data) = @_;
   print "Cnt = $args{cnt}\n";   # cnt garunteed to be a scalar
   for my $item ( @{$args{list}} ) { ... } # list garunteed to be an array ref
   while( my ( $key, $val ) = %{$args{data}} ) { ... }
     # data garunteed to be a hash-ref
 }

 my %data = ( a => 1, b => 2 );
 foo( cnt => 5, list => [ 1, 2, 3 ], data => \%data );

The lack of any prefix would mean that any data-type would be allowable.

=head1 IMPLEMENTATION

The compilation and run-time code would have to be augmented to handle the
restrictive field-attributes.

=over 4

=item Compile Time

When a new sub statement is detected with fields or required_fields, a
strict-hash structure will have to be internally defined for it (see RFC 273
for details).  If perl is not in strict mode, then it's possible that lazy
execution of functions may have been compiled prior to reading the proto-type.
In the interests of efficiency, they will simply have to be deferred to
run-time checking (through the setting of flags).

For all subsequent invocations of the subroutine, traditional proto-type
checking will be augmented with parameter-checking so long as no expansions of
array's / hashes are detected.  This will involve checking the existence of
key-names.

If data-type checking is implemented (in perl), and the subroutine has elected
to use it, then static values (of the parameter hash) will be checked for
proper data-types.  Any use of variables as values (which is often the case)
will defer additional checking to run-time.

The subroutine-call will be flagged as having been compiler-checked for
parameter presence, and optionally data-type correctness.  In order to
alleviate complexity, if data-type correctness is required, and _any_
parameters are dynamic, then the sub-call will be flagged for run-time
data-type-correctness-checking.

Optionally, the compilation stage may try and detect usage of a strict-hash for
the hash-parameters.  As in:

 sub foo($$$%) : required_fields(a b) {
  my ( $a, $b, $c );
  my %hash : fields(a b);
  ( $a, $b, $c, %hash ) = @_;
  ...
 }

In this case, it might be possible to optimize the last 4 array possitions of
@_ in such a way that the hash assignment at the very least by-passes
additional run-time checking of fields, and at the most, performs a direct
array-copy of data as in:

  my %hash : fields(a b);
  if ( COMPILER_OPTIMIZED && NO_DYNAMIC_FIELDS ) {
   @hash{qw(a b)} = $_[4, 5]; #which internally is @hash[0, 1] = $_[4, 5]
  } else {
   %hash = @_[ 3 .. $#_ ];
  }

This assumes that the sub-call was optimized by re-ordering the fields.
Obviously this can only work on required_fields.  Additionally, reordering of
fields would be expensive in a run-time environment.  Additionally, this adds
complexity when dealing with shift's of @_, but this can all be determined at
compile-time.  Again, this is an optimization and thus optional.

=item Run Time

Once a subroutine is called, it's arguments are expanded and placed on the call
stack as has occured historically.  Part of those arguments is a hidden field
that describes how much compile-time optimization has occured.  This may take
the form of a different type of sub-call op-code (ideal case), or in the form
of a hidden stack-parameter.  In either case, several things are checked for.

If the call was optimized into an array, then no additional computation will
occur, @_ will be passed as is.  One possible optimization at this point would
be to assign the trailing hash-parameters to the strict-args-hash directly as
above (since the strict hash is really an array with hash-like-access).  This
is the ideal case.

If dynamic values were used in the called hash, perl will have to internally
check the data-types for validity if requested (this could be expensive, but if
the user really needed it, they'd have to do it in even more expensive
perl-code).  I suggest that we should not punish good programming practices by
adding extra typing or massive performance penalties.

If the key-names were dynamically expanded, then perl will internally check
key-names for the set of required and allowable.  The above check for
value-data-types is then optionally performed.

=back

=head1 SUMMARY

In summary, in keeping with perl's spirit, we should definately not
enforce a new function / method invocation process; not even for
hash-based / named parameters.  Also, it makes little sence to produce
an entirely new syntax for one or two special cases, which only obtain
performance benifits under certain conditions.  This would needlessly
produce legacy code which would be difficult to maintain in the
future.

This RFC suggests a compatible method of named-parameters through the use of an
optional compiler-level hash.  Initial implementations could all be applied as
a form of pre-processor.  Subsequent versions could internally optimize various
special cases.  The use of attributes is a logical extension, since we first
define the core of an entity, then extend the description for the purposes of
optimization and narrowing of correctness of code.

This RFC made heavy reference to the proposed built-in-pseudo-hash or
strict-hash described in RFC 273, but the beauty is that that just about
everything is independant and thus optional.  The following is the minimum
defined by this RFC:

 sub foo : fields( a b c) {
  # uses @_ just as before
 }

 foo( @args ); # so long as args defines the hash ( a=> 1, b => 2, c => 3 )

The following is the ideal case (allows all possible optimizations:

 sub foo($%) : fields( $a %b @c ) required_fields( $a ) doc("useful stuff")
{
   # uses @_ just as before
   my $scalar = shift;
   my %args : fields ( a b c ) = @_;
 }

 foo( $scalar, a => 5, b => { .. }, c => [.. ] );


=head1 REFERENCES

Thread

man perlref: pseudo-hashes

RFC 273: Internal representation of Pseudo-hashes using attributes.

RFC 176: subroutine / generic entity documentation

RFC 57: Subroutine prototypes and parameters

RFC 75: structures and interface definitions

RFC 152: Replace invocant in @_ with self() builtin
RFC 160 (v3) Function-call named parameters (with compiler optimizations)

Reply via email to