On Mon, 2004-08-09 at 14:14, Dan Sugalski wrote:

> Additionally if we have source text which is 
> Latin-n, EBCDIC, ASCII, or whatever we must be 
> able to convert it with no loss to Unicode. 
> (Which I believe is now doable with Unicode 4.0) 
> Losslessly converting Unicode to 
> ASCII/EBCDIC/whatever is *not* required, which is 
> fine as it's theoretically (and often 
> practically) impossible.

Can I suggest instead:

        If we have source text which is comprised of a non-Unicode
        character-set we must be able to convert it with minimal loss to
        Unicode (minimal being defined as zero for all Unicode-subset
        character sets).
        
        Converting Unicode to non-Unicode character sets will be
        lossless where possible, and will attempt to encode the name of
        the character in ASCII characters into the target character set.
        An example would be the conversion of the UTF-8 string (in Perl
        5 notation):
        
                "foo \x{263a} bar"
        
        to the ASCII representation:
        
                "foo {SMILING FACE, WHITE} bar"
        
        There are 4 possible failure modes, each resulting in a
        conversion exception: 1) the ASCII name is not available 2) the
        ASCII name cannot be converted into the target character set
        (recursive name-lookups are not allowed, nor would they be very
        useful) 3) a VM parameter requesting exceptions on failed
        character-set conversions has been set to a true value 4) the
        source is a PMC and that PMC has a property indicating that
        exceptions should be generated on failed conversions.

This just seems a bit more useful in the general case to me, while
allowing the language implementation the option of requesting an
exception either globally or per-PMC.

Thoughts?

-- 
â 781-324-3772
â [EMAIL PROTECTED]
â http://www.ajs.com/~ajs

Reply via email to