Nicholas Clark: # > The functions for creating and manipulating C<Parrot_String>s are # > listed below. # # Is it worth arranging a reminder in here that as parrot is # garbage collected there is no confusion about who owns # pointers to blah?
Probably. (Actually, I'd probably put it in a section above this, as it doesn't necessarily go with strings in particular. # > =item C<Parrot_String Parrot_string_new(Parrot_Interp, char* bytes, # > Parrot_Int len, Parrot_String enc)> # > # Should that char * be const char *? *shrugs* I never really understood the distinction. :^) # > Note that it is rarely a good idea to not specify the encoding if # > you're using C<bytes> and C<len>. # # I'm a native English speaker and I'm finding that double # negative hard to work out. Is there a clearer way to phrase it? "If C<bytes> and C<len> are used, specifying C<enc> is usually a good idea." # > C<src> are the same, this is a noop. This may or may not be a # > copy-on-write set; the embedder should not care. # # "This might be a copy-on-write set" ... # # And do we need a RFC like definition of should/may/must/mustn't? If so, I'd suggest the definition be patched into PDD0, so it's shared by all PDDs instead of repeating the definitions everywhere. # In which case, surely that should read "the embedded must not care"? I don't want to say "must". If they do care, they're free to include internals headers as they see fit--and deal with all the maintenance hassles this causes. # > B<XXX> Is this a good policy? # > # > =item C<Parrot_String Parrot_string_copy_bytes(Parrot_Interp, # > Parrot_String dest, char* bytes, Parrot_Int len, char* enc)> # # Again, should that be const char *bytes? Again, I dunno. :^) # > =item C<void Parrot_string_transcode(Parrot_Interp, # Parrot_String str, # > Parrot_String enc)> # > # > Transcode C<str> to C<enc>. If C<enc> isn't recognized as a valid # > encoding name by a case-insensitive match, or if it is NULL, the # > default encoding is used. # # Encodings are specified in parrot strings (not char *) yet # you state that it's case insensitive. Is case insensitivity # well defined on an encoding basis, or is it actually # dependent on the language level? [eg one might argue that in # English � and � aren't the same, but if the string is in # ISO-8859-1 then Parrot isn't going to know whether the name # was specified in English, German or Icelandic. I chose � # because I don't think there are any foreign words adopted # into English spelled with thorn. Whereas I'd not be surprised # if most other accented letters are used in some or other word] # # Independent of that, aren't we opening ourselves up to a big # performance hit by doing case insensitive matching on # arbitrary encodings (such as Unicode)? Which normal form were # we going to do it in? And if the canonical name is defined in # (say) ISO 8859-1 but their string is in Unicode, are we going # to convert before deciding whether it is the same? And if # they're in Shift-JIS but we're supplying it in ISO-8859-2 - # that's 2 conversions? # # It seems faster having names as US-ASCII and being case # insensitive, or having names case sensitive. We can be case-sensitive. I'd rather not be encoding-sensitive, but that's okay too if need be. # > =item C<Parrot_UInt Parrot_string_length(Parrot_Interp, # Parrot_String # > str)> # > # > Returns the length of C<str> in characters. Note that this is # > "characters", not "bytes"; the string's encoding defines what # > "character" means. # # Should you be clear what happens with combining characters? # If so, that's "characters", not "bytes" or "glyphs", isn't it? That's what I mean by "the encoding decides". I would imagine that Unicode encodings wouldn't count combining characters, but I don't know enough to make an informed decision about that. # Is there a cross reference to what a Parrot_UInt is? I should include a section defining Parrot_Int, Parrot_UInt, and Parrot_Float. # > =item C<Parrot_String Parrot_string_from_cstr(Parrot_Interp, char* # > cstr)> # > # > Creates a Parrot_String from the given C string. Assumes # the native # > encoding. # # const char* ? *shrugs* # > =item C<Parrot_PMC Parrot_pmc_new_vtable(Parrot_Interp, # Parrot_VTable # > vtable)> # > # > Creates a new Parrot_PMC using C<vtable>. This can be used for # > "private" PMC types. # > # > B<XXX> Is this a good idea or not? # # Singletons are considered useful in some language, aren't # they? Without this, would it be hard to efficiently create singletons? What this really deals with is if I want a custom PMC type without registering it. # > =item C<void *Parrot_alloc(Parrot_UInt size)> # > # > Calls the system C<malloc()> with C<size>. # # Are you sure you want to set that in stone? "Calls the system # malloc or equivalent" IIRC on Win32 perl5 supplies a malloc # that tracks which (i)thread allocates memory, and frees all # memory on ithread exit. And perl5 comes with its own malloc, # which if often likes to use on *nix. It should probably say "or equivalent". --Brent Dax <[EMAIL PROTECTED]> @roles=map {"Parrot $_"} qw(embedding regexen Configure) "Java golf. That'd be a laugh. 'Look, I done it in 15!' 'Characters?' 'No, classes!'" --Ferret, in the Monastery