On Wed, Nov 12, 2003 at 02:07:52PM -0800, Mark A. Biggar wrote:

> And even when the sequence of Unicode code-points is the same, some
> encodings have multiple byte sequences for the same code-point.  For 
> example, UTF-8 has two ways to encode a code-point that is larger the
> 0xFFFF (Unicode as code-points up to 0x10FFF), as either two 16 bit
> surrogate code points encoded as two 3 byte UTF-8 code sequences or as
> a single value encoded as a single 4 or 5 byte UTF-8 code sequence.

Is it legal to encode surrogate pairs as UTF8? Or does that count as
malformed UTF8?

Nicholas Clark

Reply via email to