[9fans] About The Codes Beyond Unicode-BMP

Hongzheng Wang Thu, 13 Mar 2008 08:04:44 -0700

Hi,

I did an experiment to test if the programs in Plan9 could support the
codes beyond Unicode-BMP.
The result is not so good.


Let's repeat it:

Take U+01000 code for example.  Create a file and fill it with only
one character U+010000 encoded
in UTF-8.

Note that It is could be done with Vim on Linux since Vim has a good
support.  Double check could
be done by Nvi.  The internal representation of U+010000's with UTF-8
is F0908080 [1].

Open the file by ed or sam or acme.  Of course, it could not be
displayed correctly since no fonts in
system could coverage such a code yet.  Then, just re-write the file
again.  Then open it again by
non-Plan9 program, say, Nvi on Linux.  The internal representation became
EFBFBDEFBFBDEFBFBDEFBFBD.  That is, both ed and sam (also acme) failed
to recognize
U+010000 encoded by UTF-8, and destroyed it when writing.

So, does Plan9 acctually supports only the codes in Unicode-BMP?

BTW: the attachment is the gzipped test file containing only U+010000
encoded by UTF-8.

[1] http://en.wikipedia.org/wiki/UTF-8

-- 
HZ

test.gz
Description: GNU Zip compressed data

[9fans] About The Codes Beyond Unicode-BMP

Reply via email to