Larry Wall <[EMAIL PROTECTED]> writes:
> Russ Allbery writes:

>> Particularly since extending UTF-8 to more than 31 bits requires
>> breaking some of the guarantees that UTF-8 makes, unless I'm missing
>> how you're encoding the first byte so as not to give it a value of
>> 0xFE.

> The UTF-16 BOMs, 0xFEFF and 0xFFFE, both turn out to be illegal UTF-8 in
> any case, so it doesn't much matter, assuming BOMs are used on UTF-16
> that has to be auto-distinguished from UTF-8.  (Doing any kind of
> auto-recognition on 16-bit data without BOMs is problematic in any
> case.)

Yeah, but one of the guarantees of UTF-8 is:

   -  The octet values FE and FF never appear.

I can see that this property may not be that important, but it makes me
feel like things that don't have this property aren't really UTF-8.

-- 
Russ Allbery ([EMAIL PROTECTED])             <http://www.eyrie.org/~eagle/>

Reply via email to