* Frank Ch. Eigler: >> > Yes, and yet we have had the bidi situation recently where UTF-8 raw >> > codes could visually confuse a human reader whereas escaped \uXXXX >> > wouldn't. If we forbid \uXXXX unilaterally, we literally become >> > incompatible with JSON (RFC8259 7. String. "Any character may be >> > escaped."), and for what? >> >> RFC 8259 says this: >> >> However, the ABNF in this specification allows member names and >> string values to contain bit sequences that cannot encode Unicode >> characters; for example, "\uDEAD" (a single unpaired UTF-16 >> surrogate). Instances of this have been observed, for example, when >> a library truncates a UTF-16 string without checking whether the >> truncation split a surrogate pair. The behavior of software that >> receives JSON texts containing such values is unpredictable; for >> example, implementations might return different values for the length >> of a string value or even suffer fatal runtime exceptions. >> >> A UTF-8 environment has to enforce *some* additional constraints >> compared to the official JSON syntax. > > I'm sorry, I don't see how. If a JSON string were to include the > suspect "\uDEAD", but from observing our hypothetical "no escapes!" > rule they could reencode it as the UTF-8 octets 0xED 0xBA 0xAD. > ISTM we're no better off.
These octets aren't UTF-8. UTF-8 never contains surrogate pairs (paired or unpaired). 8-( Thanks, Florian