RE: [DRAFT PPD] External Data Interfaces

Brent Dax Sun, 18 Aug 2002 15:16:06 -0700

Nicholas Clark:
# > The functions for creating and manipulating C<Parrot_String>s are 
# > listed below.
# 
# Is it worth arranging a reminder in here that as parrot is 
# garbage collected there is no confusion about who owns 
# pointers to blah?

Probably.  (Actually, I'd probably put it in a section above this, as it
doesn't necessarily go with strings in particular.

# > =item C<Parrot_String Parrot_string_new(Parrot_Interp, char* bytes, 
# > Parrot_Int len, Parrot_String enc)>
# > 
# Should that char * be const char *?

*shrugs*  I never really understood the distinction.  :^)

# > Note that it is rarely a good idea to not specify the encoding if 
# > you're using C<bytes> and C<len>.
# 
# I'm a native English speaker and I'm finding that double 
# negative hard to work out. Is there a clearer way to phrase it?

"If C<bytes> and C<len> are used, specifying C<enc> is usually a good
idea."

# > C<src> are the same, this is a noop.  This may or may not be a 
# > copy-on-write set; the embedder should not care.
# 
# "This might be a copy-on-write set" ...
# 
# And do we need a RFC like definition of should/may/must/mustn't?

If so, I'd suggest the definition be patched into PDD0, so it's shared
by all PDDs instead of repeating the definitions everywhere.

# In which case, surely that should read "the embedded must not care"?

I don't want to say "must".  If they do care, they're free to include
internals headers as they see fit--and deal with all the maintenance
hassles this causes.

# > B<XXX> Is this a good policy?
# > 
# > =item C<Parrot_String Parrot_string_copy_bytes(Parrot_Interp,
# > Parrot_String dest, char* bytes, Parrot_Int len, char* enc)>
#
# Again, should that be const char *bytes?

Again, I dunno.  :^)

# > =item C<void Parrot_string_transcode(Parrot_Interp, 
# Parrot_String str, 
# > Parrot_String enc)>
# > 
# > Transcode C<str> to C<enc>.  If C<enc> isn't recognized as a valid 
# > encoding name by a case-insensitive match, or if it is NULL, the 
# > default encoding is used.
# 
# Encodings are specified in parrot strings (not char *) yet 
# you state that it's case insensitive. Is case insensitivity 
# well defined on an encoding basis, or is it actually 
# dependent on the language level? [eg one might argue that in 
# English � and � aren't the same, but if the string is in 
# ISO-8859-1 then Parrot isn't going to know whether the name 
# was specified in English, German or Icelandic. I chose � 
# because I don't think there are any foreign words adopted 
# into English spelled with thorn. Whereas I'd not be surprised 
# if most other accented letters are used in some or other word]
# 
# Independent of that, aren't we opening ourselves up to a big 
# performance hit by doing case insensitive matching on 
# arbitrary encodings (such as Unicode)? Which normal form were 
# we going to do it in? And if the canonical name is defined in 
# (say) ISO 8859-1 but their string is in Unicode, are we going 
# to convert before deciding whether it is the same? And if 
# they're in Shift-JIS but we're supplying it in ISO-8859-2 - 
# that's 2 conversions?
# 
# It seems faster having names as US-ASCII and being case 
# insensitive, or having names case sensitive.

We can be case-sensitive.  I'd rather not be encoding-sensitive, but
that's okay too if need be.

# > =item C<Parrot_UInt Parrot_string_length(Parrot_Interp, 
# Parrot_String 
# > str)>
# > 
# > Returns the length of C<str> in characters.  Note that this is 
# > "characters", not "bytes"; the string's encoding defines what 
# > "character" means.
# 
# Should you be clear what happens with combining characters?
# If so, that's "characters", not "bytes" or "glyphs", isn't it?

That's what I mean by "the encoding decides".  I would imagine that
Unicode encodings wouldn't count combining characters, but I don't know
enough to make an informed decision about that.

# Is there a cross reference to what a Parrot_UInt is?

I should include a section defining Parrot_Int, Parrot_UInt, and
Parrot_Float.

# > =item C<Parrot_String Parrot_string_from_cstr(Parrot_Interp, char* 
# > cstr)>
# > 
# > Creates a Parrot_String from the given C string.  Assumes 
# the native 
# > encoding.
# 
# const char* ?

*shrugs*

# > =item C<Parrot_PMC Parrot_pmc_new_vtable(Parrot_Interp, 
# Parrot_VTable 
# > vtable)>
# > 
# > Creates a new Parrot_PMC using C<vtable>.  This can be used for 
# > "private" PMC types.
# >
# > B<XXX> Is this a good idea or not?
# 
# Singletons are considered useful in some language, aren't 
# they? Without this, would it be hard to efficiently create singletons?

What this really deals with is if I want a custom PMC type without
registering it.

# > =item C<void *Parrot_alloc(Parrot_UInt size)>
# > 
# > Calls the system C<malloc()> with C<size>.
# 
# Are you sure you want to set that in stone? "Calls the system 
# malloc or equivalent" IIRC on Win32 perl5 supplies a malloc 
# that tracks which (i)thread allocates memory, and frees all 
# memory on ithread exit. And perl5 comes with its own malloc, 
# which if often likes to use on *nix.

It should probably say "or equivalent".

--Brent Dax <[EMAIL PROTECTED]>
@roles=map {"Parrot $_"} qw(embedding regexen Configure)

"Java golf. That'd be a laugh. 'Look, I done it in 15!' 'Characters?'
'No, classes!'"
    --Ferret, in the Monastery

RE: [DRAFT PPD] External Data Interfaces

Reply via email to