At 4:15 PM -0400 8/10/04, Aaron Sherman wrote:
On Mon, 2004-08-09 at 14:14, Dan Sugalski wrote:

 Additionally if we have source text which is
 Latin-n, EBCDIC, ASCII, or whatever we must be
 able to convert it with no loss to Unicode.
 (Which I believe is now doable with Unicode 4.0)
 Losslessly converting Unicode to
 ASCII/EBCDIC/whatever is *not* required, which is
 fine as it's theoretically (and often
 practically) impossible.

Can I suggest instead:

If we have source text which is comprised of a non-Unicode
character-set we must be able to convert it with minimal loss to
Unicode (minimal being defined as zero for all Unicode-subset
character sets).
Converting Unicode to non-Unicode character sets will be
lossless where possible, and will attempt to encode the name of
the character in ASCII characters into the target character set.

Gack. No, I think this'd be a bad idea as the default behavior. What's right is up in the air -- I'm figuring we'll either throw an exception or substitute in a default character, but the full expansion's definitely way too much.
--
Dan


--------------------------------------it's like this-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to