On Mon, 2004-08-09 at 14:14, Dan Sugalski wrote:
Additionally if we have source text which is Latin-n, EBCDIC, ASCII, or whatever we must be able to convert it with no loss to Unicode. (Which I believe is now doable with Unicode 4.0) Losslessly converting Unicode to ASCII/EBCDIC/whatever is *not* required, which is fine as it's theoretically (and often practically) impossible.
Can I suggest instead:
If we have source text which is comprised of a non-Unicode
character-set we must be able to convert it with minimal loss to
Unicode (minimal being defined as zero for all Unicode-subset
character sets).
Converting Unicode to non-Unicode character sets will be
lossless where possible, and will attempt to encode the name of
the character in ASCII characters into the target character set.
Gack. No, I think this'd be a bad idea as the default behavior. What's right is up in the air -- I'm figuring we'll either throw an exception or substitute in a default character, but the full expansion's definitely way too much.
--
Dan
--------------------------------------it's like this------------------- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk