On Thu, 25 Oct 2001, Dan Sugalski wrote:

> The only bits of the interpreter that much care about the
> string data are the regex engine parts, and those only operate on
> fixed-sized data.

Care to elaborate?  I thought the mandate from Larry was to have regexes
compile down to a stream of string ops.  Doesn't that mean it should work
regardless of the encoding of the string?

> The interpreter can only peek inside a string if that string is of
> fixed length, and the interpreter doesn't actually care about the
> character set the data is in.

Why is this necessary at all?  Wouldn't it be prefereable to have all
access go through the String vtable regardless of the encoding?

> =item encoding
>
> Pointer to the library that handles the string encoding. Encoding is
> basically how the stream of bytes pointed to by C<bufstart> can be
> turned into a stream of 32-bit codepoints. Examples include UTF-8, Big
> 5, or Shift JIS. Unicode, Ascii, or EBCDIC are B<not> encodings.first

.first?

Aside from the above, this was a nice refresher.

-sam

Reply via email to