John Machin wrote: > 1. *By definition*, you can encode *any* Unicode string into utf-8. > Proves nothing. > 2. \u00a0 [no-break space] has no equivalent in gb2312, nor in the > later gbk alias cp936. It does have an equivalent in the latest Chinese > encoding, gb18030.
Also, *by definition*, though :-) For those that have not followed encodings too closely: gb18030 is to gb2312 what UTF-8 is to ASCII. Both encode the entire Unicode in an algorithmic way, and provide byte-for-byte identical encodings for the for their respective subset. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list