PDD 4: Internal data types

Dan Sugalski Fri, 02 Mar 2001 08:59:15 -0800
Yes, I know I promised the GC PDD, but this was simpler and half finished. 
Now it's all finished, and can be used some in both the vtable PDD and the 
utility functions PDD.

-----Cut here with a sharp knife--------
=head1 TITLE

Perl's internal data types

=head1 VERSION

1

=head2 CURRENT

     Maintainer: Dan Sugalski <[EMAIL PROTECTED]>
     Class: Internals
     PDD Number: 4
     Version: 1
     Status: Developing
     Last Modified: 1 March 2001
     PDD Format: 1
     Language: English

=head2 HISTORY

=over 4

=item Version 1

First version

=back

=head1 CHANGES

None. First version

=head1 ABSTRACT

This PDD describes perl's known internal data types.

=head1 DESCRIPTION

This PDD details the primitive datatypes that the perl core knows how
to deal with. These types are lower-level than

=head1 IMPLEMENTATION

=head2 Intger data types

Integer data types are generically referred to as C<INT>s. There is an
C<INT> typedef that is guaranteed to hold any integer type.

=over 4

=item Platform-native integer

These are whatever size native integer was chosen at perl
configuration time. The C-level typedef C<IV> and C<UV> get you a
platform-native signed and unsigned integer respectively.

=item Arbitrary precision integers

Big integers, or bigints, are arbitrary-length integer numbers. The
only limit to the number of digits in a bigint is the lesser of the
amount of memory available or the maximum value that can be
represented by a C<UV>. This will generally allow at least 4 billion
digits, which ought to be far more than enough for anyone.

The C structure that represents a bigint is:

   struct bigint {
     void *num_buffer;
     UV length;
     IV exponent;
     UV flags;
   }

=begin question

Should we scrap the buffer pointer and just tack the buffer on the end
of the structure? Saves a level of indirection, but means if we need
to make the buffer bigger we have to adjust anything pointing to it.

=end question

The C<num_buffer> pointer points to the buffer holding the actual
number, C<length> is the length of the buffer, C<exponent> is the base
10 exponent for the number (so 2e4532 doesn't take up much space), and
C<flags> are some flags for the bigint.

B<Note:>The flags and exponent fields may be generally unused, but are
in to make the base structure identical in size and field types to
other structures. They may be removed before the first release of perl
6.

=back

=head2 Floating point data types

Floating point data types are generically reffered to as
C<NUM>s. There is a C<NUM> typedef that is guaranteed to hold any
floating point data type.

=over 4

=item Platform native float

These are whatever size float was chosen when perl was configured. The
C level typedef C<NV> will get you one of these.

=item Arbitrary precision decimal numbers

Arbitrary precision decimal numbers, or bignums, can have any number
of digits before and after the decimal point. They are represented by
the structure:

   struct bignum {
     void *num_buffer;
     UV length;
     IV exponent;
     UV flags;
   }

and yes, this looks identical to the bigint structure. This isn't
accidental. Upgrading a bigint to a bignum should be quick.

=for question

Like the bigint structure, should we toss the data pointer and just
tack the data on the end?

=end question

=back

=head2 String data types

Perl has a single internal string form:

   struct perl_string {
     void *string_buffer;
     UV length;
     UV allocated;
     UV flags;
   }

The low three bits of the flags field is reserved for the type of the
string. The various types are:

=over 4

=item BINARY (0)

=item ASCII (1)

=item EBCDIC (2)

=item UTF_8 (3)

=item UTF_32 (4)

=item NATIVE_1 (5) through NATIVE_3 (7)

=back

It may be worth it to redefine things so that bit three always
indicates Unicode of some sort, or native encoding, or something.

Perl may use the type field as an offset into a table of generic
string handling or conversion routines, which would allow us to
dynamically add in encodings. Or it might not, which is fine too.

=head1 ATTACHMENTS

None

=head1 REFERENCES

The perl modules Math::BigInt and Math::BigFloat. The Unicode standard
at http://www.unicode.org.

=head1 GLOSSARY

=over 4

=item Type

Type refers to a low-level perl data type, such as a string or integer.

=item Class

Class refers to a higher-level piece of perl data. Each class has its
own vtable, which is a class' distinguishing mark. Classes live one
step below the perl source level, and should not be confused with perl
packages.

=item Package

A package is a perl source level construct.

=back
PDD 4: Internal data types

Reply via email to