* Frank Ch. Eigler:

>> > Yes, and yet we have had the bidi situation recently where UTF-8 raw
>> > codes could visually confuse a human reader whereas escaped \uXXXX
>> > wouldn't.  If we forbid \uXXXX unilaterally, we literally become
>> > incompatible with JSON (RFC8259 7. String. "Any character may be
>> > escaped."), and for what?
>> 
>> RFC 8259 says this:
>> 
>>    However, the ABNF in this specification allows member names and
>>    string values to contain bit sequences that cannot encode Unicode
>>    characters; for example, "\uDEAD" (a single unpaired UTF-16
>>    surrogate).  Instances of this have been observed, for example, when
>>    a library truncates a UTF-16 string without checking whether the
>>    truncation split a surrogate pair.  The behavior of software that
>>    receives JSON texts containing such values is unpredictable; for
>>    example, implementations might return different values for the length
>>    of a string value or even suffer fatal runtime exceptions.
>> 
>> A UTF-8 environment has to enforce *some* additional constraints
>> compared to the official JSON syntax.
>
> I'm sorry, I don't see how.  If a JSON string were to include the
> suspect "\uDEAD", but from observing our hypothetical "no escapes!"
> rule they could reencode it as the UTF-8 octets 0xED 0xBA 0xAD.
> ISTM we're no better off.

These octets aren't UTF-8.  UTF-8 never contains surrogate pairs (paired
or unpaired). 8-(

Thanks,
Florian

Reply via email to