The POD below my sig is a proposed PDD on external data interfaces, that is, the way embedders and extenders will access Parrot's data types. It covers Strings, Buffers, and PMCs, as well as a few related functions.
Let me know what you think. --Brent Dax <[EMAIL PROTECTED]> @roles=map {"Parrot $_"} qw(embedding regexen Configure) He who fights and runs away wasted valuable running time with the fighting. =head1 TITLE External Data Interfaces =head1 VERSION 1.0 =head2 CURRENT Maintainer: Brent Dax <[EMAIL PROTECTED]> Class: Internals PDD Number: TBD Version: 1.0 Status: Proposed Last Modified: 13 August 2002 PDD Format: 1 Language: English =head2 HISTORY =over 4 =item version 1 None. First version =back =head1 CHANGES =over 4 =item Version 1.0 None. First version =back =head1 ABSTRACT This PDD describes the external interfaces to Parrot data structures, such as PMCs and Strings. These interfaces are shared by the embedding and extending systems. =head1 DESCRIPTION One of the major flaws of Perl 5 was that the extension interfaces were, for lack of a better term, "raw". The same interfaces were used by extenders and core developers; this necessitated much gnashing of teeth when a function used by extenders was no longer needed or proved insufficient for a task--and sweeping changes were next to impossible. One of the intents of Parrot is to provide much cleaner extension interfaces. Most other languages in Perl's class have clean extension interfaces, where the internal functions aren't used by extenders and the external functions aren't used by internals developers. This PDD describes the parts of the overall embedding/extending interface related to user-level data; these are defined separately from embedding and extending interfaces because they are shared by both. "User-level data" is defined to include PMCs, Strings, and Buffers. The design of the external data interfaces has two major objectives: =over 4 =item 1. To be small and simple. =item 2. To be complete. =back Obviously, these two goals conflict. For this reason, there isn't much redundancy in the interfaces. For example, all keyed PMC functions accept only PMCs as sources, indices, and destinations. =head1 IMPLEMENTATION =head2 Strings Parrot-level C<String>s are to be represented by the type C<Parrot_String>. This type is defined to be a pointer to a C<struct parrot_string_t>. The functions for creating and manipulating C<Parrot_String>s are listed below. =over 4 =item C<Parrot_String Parrot_string_new(Parrot_Interp, char* bytes, Parrot_Int len, Parrot_String enc)> Allocates a Parrot_String and sets it to the first C<len> bytes of C<bytes>. C<enc> is the name of the encoding to use (e.g. "ASCII", "UTF-8", "Shift-JIS"); if a case-insensitive match of this name doesn't result in an encoding name that Parrot knows about, or if NULL is passed as the encoding, the platform's default encoding is assumed.[1] Values of NULL and 0 can be passed in for C<bytes> and C<len> if the user desires an empty string. Note that it is rarely a good idea to not specify the encoding if you're using C<bytes> and C<len>. =item C<Parrot_String Parrot_string_copy(Parrot_Interp, Parrot_String dest, Parrot_String src)> Sets C<lhs> to C<rhs> and returns C<dest>. If C<dest> is NULL, a new Parrot_String is allocated, operated on and returned. If C<dest> and C<src> are the same, this is a noop. This may or may not be a copy-on-write set; the embedder should not care. B<XXX> Is this a good policy? =item C<Parrot_String Parrot_string_copy_bytes(Parrot_Interp, Parrot_String dest, char* bytes, Parrot_Int len, char* enc)> Sets C<dest> to the first C<len> bytes of C<bytes> and returns C<dest>. C<enc> is taken to be the encoding of C<bytes>; the Parrot_String will retain its original encoding. (Call C<Parrot_string_transcode> on the Parrot_String first if you want to retain C<enc>.) =item C<Parrot_String Parrot_string_encoding(Parrot_Interp, Parrot_String str)> Returns the encoding of C<str> as a Parrot_String. =item C<void Parrot_string_transcode(Parrot_Interp, Parrot_String str, Parrot_String enc)> Transcode C<str> to C<enc>. If C<enc> isn't recognized as a valid encoding name by a case-insensitive match, or if it is NULL, the default encoding is used. =item C<Parrot_String Parrot_string_concat(Parrot_Interp, Parrot_String dest, Parrot_String lhs, Parrot_String rhs)> Set C<dest> to the concatenation of C<lhs> and C<rhs> and return the value of C<dest>. If C<dest> is NULL, a new Parrot_String is allocated, operated on and returned. C<dest>'s value may be the same as either or both of C<lhs> and C<rhs>. =item C<Parrot_String Parrot_string_chop(Parrot_Interp, Parrot_String dest, Parrot_String lhs, Parrot_Int len)> Copy C<lhs> to C<dest> and remove the last C<len> characters from it, returning C<dest>. If C<dest> is NULL, a new Parrot_String is allocated, operated on and returned. =item C<Parrot_UInt Parrot_string_length(Parrot_Interp, Parrot_String str)> Returns the length of C<str> in characters. Note that this is "characters", not "bytes"; the string's encoding defines what "character" means. =item C<Parrot_UInt Parrot_string_ord(Parrot_Interp, Parrot_String str, Parrot_UInt index)> Returns the value of the character at C<index> in C<str>. Note that this is "character", not "byte"; the string's encoding defines what "character" means. =item C<Parrot_String Parrot_string_substr(Parrot_Interp, Parrot_String dest, Parrot_String str, Parrot_UInt index, Parrot_UInt len)> Sets C<dest> to the substring of C<str> starting at character C<index> and continuing for C<len> characters and returns C<dest>. Note that this is "characters", not "bytes"; the string's encoding defines what "character" means. If C<dest> is NULL, a new Parrot_String is allocated, operated on and returned. =item C<void Parrot_string_replace(Parrot_Interp, Parrot_String str, Parrot_UInt index, Parrot_UInt len, Parrot_String rep)> Replaces the substring of C<str> starting at character C<index> and continuing for C<len> characters with the value of C<rep>. Note that this is "characters", not "bytes"; the string's encoding defines what "character" means. C<rep> need not be the same length as the substring being replaced. =item C<Parrot_String Parrot_string_from_cstr(Parrot_Interp, char* cstr)> Creates a Parrot_String from the given C string. Assumes the native encoding. =item C<char* Parrot_string_to_cstr(Parrot_Interp, Parrot_String str)> Creates a null-terminated C string from the given Parrot_String. If necessary, transcodes to the native encoding. Use of this function is discouraged for several reasons--information can be lost in the transcoding and null characters in the string can cause problems. However, this function is sometimes necessary, so it's included. The storage for the C string is created with C<Parrot_alloc()> and must be freed with C<Parrot_free()>. =back =head2 Buffers Parrot-level C<Buffer>s are to be represented by the type C<Parrot_Buffer>. This is defined to be a pointer to a C<struct parrot_buffer_t>. The functions for creating and manipulating C<Parrot_Buffer>s are listed below. =over 4 =item C<Parrot_Buffer Parrot_buffer_new(Parrot_Interp, Parrot_UInt size)> Allocates a new C<Parrot_Buffer> with C<size> bytes of memory in it. =item C<void Parrot_buffer_resize(Parrot_Interp, Parrot_Buffer buf, Parrot_UInt newsize)> Allocates C<newsize> bytes of memory, copies the contents of C<buf> to it, and places the new memory into C<buf>. =item C<Parrot_Buffer Parrot_buffer_copy(Parrot_Interp, Parrot_Buffer dest, Parrot_Buffer src)> Copies the contents of C<src> into C<dest>, resizing C<dest> if necessary, and returns C<dest>. If C<dest> is NULL, a new Parrot_Buffer is allocated, operated on and returned. =item C<Parrot_UInt Parrot_buffer_size(Parrot_Interp, Parrot_Buffer buf)> Returns the size of the contents of C<buf>. =item C<void* Parrot_buffer_contents(Parrot_Interp, Parrot_Buffer buf)> Returns a pointer to the contents of C<buf>. This pointer can be used to directly manipulate C<buf>'s contents. B<Warning>: Make sure to block the garbage collector before calling this function! Otherwise, the pointer may become invalid, resulting in badness ranging from losing data to core dumps. B<Warning>: Make sure that this pointer doesn't last beyond when garbage collection is unblocked! =back =head2 PMCs Parrot-level C<PMC>s are to be represented by the type C<Parrot_PMC>. This is defined to be a pointer to a C<struct parrot_pmc_t>. The functions for creating and manipulating C<Parrot_PMC>s are listed below. =over 4 =item C<Parrot_PMC Parrot_pmc_new(Parrot_Interp, Parrot_String type)> Creates a new Parrot_PMC of the type C<type>. If C<type> is not a case-insensitive match of any type already registered with Parrot, this function will throw an exception. =item C<Parrot_PMC Parrot_pmc_new_vtable(Parrot_Interp, Parrot_VTable vtable)> Creates a new Parrot_PMC using C<vtable>. This can be used for "private" PMC types. B<XXX> Is this a good idea or not? =item C<Parrot_Int Parrot_pmc_get_integer(Parrot_Interp, Parrot_PMC src)> Returns the result of C<< src->vtable->get_integer() >>. =item C<Parrot_Float Parrot_pmc_get_number(Parrot_Interp, Parrot_PMC src)> Returns the result of C<< src->vtable->get_number() >>. =item C<Parrot_String Parrot_pmc_get_string(Parrot_Interp, Parrot_PMC src)> Returns the result of C<< src->vtable->get_string() >>. =item C<Parrot_PMC Parrot_pmc_get_pmc(Parrot_Interp, Parrot_PMC src)> Returns the result of C<< src->vtable->get_pmc() >>. =item C<Parrot_PMC Parrot_pmc_set_integer(Parrot_Interp, Parrot_PMC dest, Parrot_Int src)> Calls C<< dest->vtable->set_integer(src) >> and returns C<dest>.[2] =item C<Parrot_PMC Parrot_pmc_set_number(Parrot_Interp, Parrot_PMC dest, Parrot_Float src)> Calls C<< dest->vtable->set_number(src) >> and returns C<dest>. =item C<Parrot_PMC Parrot_pmc_set_string(Parrot_Interp, Parrot_PMC dest, Parrot_String src)> Calls C<< dest->vtable->set_string(src) >>. =item C<Parrot_PMC Parrot_pmc_set_pmc(Parrot_Interp, Parrot_PMC dest, Parrot_PMC src)> Calls C<< dest->vtable->set_pmc(src) >>. =item C<Parrot_PMC Parrot_pmc_get_indexed(Parrot_Interp, Parrot_PMC src, Parrot_PMC index)> Constructs a key from C<index> and calls C<< src->vtable->get_pmc_keyed(key) >>.[3] =item C<Parrot_PMC Parrot_pmc_get_indexed_i(Parrot_Interp, Parrot_PMC src, Parrot_Int index)> Calls C<< src->vtable->get_pmc_keyed_integer(index) >>. =item C<Parrot_PMC Parrot_pmc_set_indexed(Parrot_Interp, Parrot_PMC dest, Parrot_PMC index, Parrot_PMC src)> Constructs a key from C<index> and calls C<< dest->vtable->set_pmc_keyed(key, src, NULL) >>. =item C<Parrot_PMC Parrot_pmc_set_indexed_i(Parrot_Interp, Parrot_PMC dest, Parrot_Int index, Parrot_PMC src)> Calls C<< dest->vtable->set_pmc_keyed_integer(index, src, 0) >>. =item C<Parrot_PMC Parrot_pmc_call(Parrot_Interp, Parrot_PMC sub, Parrot_PMC args)> Pushes C<args> onto the stack, calls C<sub>, pops the return value(s) off the stack, and returns them. =item C<Parrot_PMC Parrot_pmc_methcall(Parrot_Interp, Parrot_PMC object, Parrot_String method, Parrot_PMC args)> Finds C<method> in C<object>, pushes C<object> and C<args> onto the stack, calls the method, pops the return value(s) off the stack, and returns them. =back =head2 Miscellanea =over 4 =item C<void *Parrot_alloc(Parrot_UInt size)> Calls the system C<malloc()> with C<size>. =item C<void Parrot_free(void * ptr)> Calls the system C<free()> with C<ptr>. =item C<void Parrot_block_gc(Parrot_Interp)> Blocks the garbage collector on the selected interpreter. Note that this is done by incrementing a counter, so three calls to C<Parrot_block_gc()> require three calls to C<Parrot_unblock_gc()> before GC is reactivated. =item C<void Parrot_unblock_gc(Parrot_Interp)> Unblocks the garbage collector on the selected interpreter. =back =head1 ATTACHMENTS None. =head1 FOOTNOTES [1] A string is used so that Parrot can support pluggable string encodings but still degrade gracefully if the given encoding hasn't been plugged in. [2] This allows for code like C<Parrot_PMC *mypmc=Parrot_pmc_set_integer(interp, Parrot_pmc_new(interp, "PerlInt"), 1)>. [3] Note how limited keyed support is. This is to keep things simple. I thought about doing combinations of return types and key types, but that caused a combinatorial explosion, and I didn't think it wise to expose keys to the outside. =head1 REFERENCES PDD 10 (Embedding) PDD 11 (Extending) L<perlembed>, L<perlxs>