These are the details for the encoding API. This is the layer that mediates between parrot, which sees strings as a sequence of codepoints, and the low-level buffer, which is filled with bytes.

Note that the charset layer lives above this, but since I've not finished that part yet I figure better the finished piece than wait even longer. Please note that comments are *very* welcome -- I want to get this right the first time so we can get it in and stop worrying about it.

Also note that while all these are presented as functions, they're really entries in a function table, so translate in your heads accordingly.

And note again that the functions are all shadowed by charset functions. So we really call the charset versions of these which may then call through to the encoding sets, so the charsets can pitch a fit if you do something they don't like. (For example, turning a Shift-JIS string to UTF-8 or something, if the charset even cares. Which it probably won't, but you never know, and the charset code will probably want to get in the way of bytesetting if it's a multibyte charset, or codepoint setting if it's a set with combining characters)

Generally only the charset code will call these anyway.

 void to_encoding(STRING *);

   Make the string the new encoding, in place

 STRING *copy_to_encoding(STRING *);

   Make a copy of the string, in the new encoding.

 UINTVAL get_codepoint(STRING *, offset);

   Return the codepoint at offset.

 void set_codepoint(STRING, offset, UINTVAL codepoint);

   Set the codepoint at offset to codepoint

 UINTVAL get_byte(STRING *, offset)

   Get the byte at offset

  void set_byte(STRING *, offset, UINTVAL byte);

   Set the byte at offset to byte

  STRING *get_codepoints(STRING, offset, count);

    Get count codepoints starting at offset, returned as a STRING of no
    charset. (If called through the charset code the returned string may be
    put into a charset if that's a valid thing)

  STRING *get_bytes(STRING, offset, count)

     Get count bytes starting at offset, returned as a binary STRING.

  void set_codepoints(STRING, offset, count, STRING codepointstring);

    Set count codepoints, at offset offset to the contents of the codepoint
    string.

  void set_bytes(STRING, offset, count, STRING binarystring);

    Set count bytes, at offset offset, to the contents of binary string

  void become_encoding(STRING *);

    Assume the string is the new encoding and make it so. Validate first
    and throw an exception if this assumption is incorrect.

  UINTVAL codepoints(STRING *);

    Return the size in codepoints

  UINTVAL bytes(STRING *);

    Return the size in bytes


I have, I'm sure, forgotten something, but let's start with this and fill in the blanks.
--
Dan


--------------------------------------it's like this-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to