On Wed, Nov 12, 2003 at 02:07:52PM -0800, Mark A. Biggar wrote: > And even when the sequence of Unicode code-points is the same, some > encodings have multiple byte sequences for the same code-point. For > example, UTF-8 has two ways to encode a code-point that is larger the > 0xFFFF (Unicode as code-points up to 0x10FFF), as either two 16 bit > surrogate code points encoded as two 3 byte UTF-8 code sequences or as > a single value encoded as a single 4 or 5 byte UTF-8 code sequence.
Is it legal to encode surrogate pairs as UTF8? Or does that count as malformed UTF8? Nicholas Clark