On Mon, 2004-08-09 at 14:14, Dan Sugalski wrote: > Additionally if we have source text which is > Latin-n, EBCDIC, ASCII, or whatever we must be > able to convert it with no loss to Unicode. > (Which I believe is now doable with Unicode 4.0) > Losslessly converting Unicode to > ASCII/EBCDIC/whatever is *not* required, which is > fine as it's theoretically (and often > practically) impossible.
Can I suggest instead: If we have source text which is comprised of a non-Unicode character-set we must be able to convert it with minimal loss to Unicode (minimal being defined as zero for all Unicode-subset character sets). Converting Unicode to non-Unicode character sets will be lossless where possible, and will attempt to encode the name of the character in ASCII characters into the target character set. An example would be the conversion of the UTF-8 string (in Perl 5 notation): "foo \x{263a} bar" to the ASCII representation: "foo {SMILING FACE, WHITE} bar" There are 4 possible failure modes, each resulting in a conversion exception: 1) the ASCII name is not available 2) the ASCII name cannot be converted into the target character set (recursive name-lookups are not allowed, nor would they be very useful) 3) a VM parameter requesting exceptions on failed character-set conversions has been set to a true value 4) the source is a PMC and that PMC has a property indicating that exceptions should be generated on failed conversions. This just seems a bit more useful in the general case to me, while allowing the language implementation the option of requesting an exception either globally or per-PMC. Thoughts? -- â 781-324-3772 â [EMAIL PROTECTED] â http://www.ajs.com/~ajs