Another possibility is to use a UTF-8 extended system where you use values over 0x10FFFF to encode temporary code block swaps in the encoding. I.e.,
some magic value means the one byte UTF-8 codes now mean the Greek block
instead of the ASCII block.
You could do that, but then I'd be forced to do something well and truly horrible to you, and we'd rather not have that. :)
Character set and encoding are metadata, and ought be stored out-of-band, at least once the data makes it into your program. Twiddling the internal representation of the bytes is a fairly sub-optimal way to do that, so I'd as soon not mandate that we have to. (I do dislike publically breaking mandates like that. Terribly inconvenient)
> At 12:28 AM +0100 3/16/04, Karl Brodowsky wrote:> sets and encodings don't go out of their way to help with that.>Anyway, it will be necessary to specify the encoding of unicode in >some way, which could possibly allow even to specify even some >non-unicode-charsets.
While I'll skip diving deeper into the swamp that is character sets and encoding (I'm already up to my neck in it, thanks, and I don't have any long straws handy :) I'll point out that the above statement is meaningless--there *are* no Unicode non-unicode charsets.
It is possible to use the UTF encodings on non-unicode charsets--you could reasonably use UTF-8 to encode, say, Shift-JIS characters. (where Shift-JIS is both an encoding and a character set, and it can be separated into pieces)
It's not unwise (and, in practice, at least in implementation quite sensible) to separate the encoding from the character set, but you need to be careful to keep the separation clear, though many of the
-- Dan
--------------------------------------"it's like this"------------------- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk