On Thu, 25 Oct 2001, Dan Sugalski wrote: > The only bits of the interpreter that much care about the > string data are the regex engine parts, and those only operate on > fixed-sized data.
Care to elaborate? I thought the mandate from Larry was to have regexes compile down to a stream of string ops. Doesn't that mean it should work regardless of the encoding of the string? > The interpreter can only peek inside a string if that string is of > fixed length, and the interpreter doesn't actually care about the > character set the data is in. Why is this necessary at all? Wouldn't it be prefereable to have all access go through the String vtable regardless of the encoding? > =item encoding > > Pointer to the library that handles the string encoding. Encoding is > basically how the stream of bytes pointed to by C<bufstart> can be > turned into a stream of 32-bit codepoints. Examples include UTF-8, Big > 5, or Shift JIS. Unicode, Ascii, or EBCDIC are B<not> encodings.first .first? Aside from the above, this was a nice refresher. -sam