>>> Jeff Clites <[EMAIL PROTECTED]> 2004-05-01 18:23:02 >>> [Finishing this discussion on p6i, since it began here.] > Good point. However, the more general usage seems to have largely > fallen out of use (to the extent to which I'd forgotten about it until > now). For instance, the Java String class lacks this generality. > Additionally, ObjC's NSString and (from what I can tell) Python and > Ruby conceive of strings as textual.
As a VM for multiple languages, Parrot must be more general than any one of those languages, though, yes? > The key point is that text and uninterpreted byte sequences are > semantically oceans apart. I'd say that as data types, byte sequences > are semantically much simpler than hashes (for instance), and > strings-as-text are much more complex. It makes little sense to > bitwise-not text, or to uppercase bytes. If your "text" is taken from a size-two character set, it makes perfect sense to complement (bitwise-not) it. Bit strings and text strings are oceans apart like Alaska and Russia. > The major problem with using "string" for the more general concept is > confusion. People do tend to get really confused here. If you define > "string of blahs" to mean "sequence of blahs" (to match the historical > usage), that's on its face reasonable. But people jump to the > conclusion that a string-as-bytes is re-interpretable as a > string-as-text (and vice-versa) via something like a cast--a > reinterpretation of the bytes of some in-memory representation. It is thus reinterpretable---via (de-)serialization. Take a "text" string, serialize it in memory as UTF-8, say, to get a bit string, and do ands ors and nots to your heart's content. If the in-memory representation is already UTF-8, the serialization is nothing more than changing the string's charset+encoding to "binary". Compilers for languages like Perl 5, which treat strings as text or bits depending on the operation being performed, can insert the serialization/deserialization ops automatically as needed.