RFC: type inference

Steve Fink Tue, 01 Aug 2000 14:54:32 -0700
=head1 TITLE

type inference

=head1 VERSION

  Maintainer: Steve Fink <[EMAIL PROTECTED]>
  Date: 1 Aug 2000
  Version: 0 (unreleased)
  Mailing List: [EMAIL PROTECTED]
  Number: (unassigned)

=head1 ABSTRACT

Types should be inferred whenever possible, and optional type qualifiers
may be used to gain whatever level of type stricture desired.

=head1 DESCRIPTION

For large systems, and often for small ones, type checking is
extremely valuable as a way of eliminating bugs at compile time and
avoiding errors while making global changes. I propose that we create
a type hierarchy, such as

   any
      list
         list(T)
      hash
         hash(T -> T)
      scalar
         reference
             ref(T)
         nonref
             number
                integer
      void

(This is just a sketch; there are many ways of skinning this cat.)
By default, only constants would be assigned a type. Every node in the
parse tree would be assigned a type. Variables would not have a single
type; they would have a possibly different type after every
assignment. So using the default rules

   1 $x = 3;
   2 $x .= "x";
   3 $h{$x} = \$x;
   4 $h{foo} = "bar";
   5 $x = f();

I<$x> would have type C<number> after line 1 and C<nonref> after
line 2. I<%h> would have type C<< hash(nonref -> ref(nonref)) >> after
line 3, and then would find the nearest ancestor in the next line,
resulting in C<< hash(nonref -> scalar) >>. Line 5's effect depends on
whether C<f()>'s type is known. If not, then I<$x> will have type
C<any> after line 5.

Notice that so far, all existing programs will always typecheck
successfully, so no burden has been placed on the programmer who does
not want types.

Now say we insert C<my $x : number> at the beginning of the example
(or some other syntax). That means that we are asserting that I<$x>
will I<always> be of type C<number>, and we will flag a type error on
line 2 and an optional warning on line 5 if the return type of C<f()>
in scalar context is unknown.

Note that error messages are only generated when two things with
strong types collide. So C<my ($x : integer) = /(\d.*)/> will not
complain, but C<my $x : integer = "string"> will.

(I am leaving out a lot of details, such as what happens to the type
of I<%h> if just after line 3 you say C<$x = [[]]>. Or what happens to
the types of all accessible variables on an eval"", or function types,
or a hundred other messy problems. But even if lots of stuff gets
promoted to type C<any>, I still think that types will be very useful
within individual subroutines and other isolated areas.)

=head1 IMPLEMENTATION

I propose not changing runtime behavior at all; in the case of 
C<my ($x : integer) = /(\d.*)/>, I<$x> may actually end up containing a
non-integral string with no warning issued. If you want a warning,
write your own RFC. ;-)

Implementation for the most part is straightforward type inference
using unification. The wrinkles come in from how complicated the type
hierarchy is, and where we want to place the balance between false
positives and false negatives. (Type theorists do not allow false
negatives, but I'm not a type theorist and their motivation for that
stance is allowing safe run-time behavioral differences.)
RFC: type inference

Reply via email to