Re: Encode exception for chinese text

Martin v. Löwis Fri, 19 May 2006 07:46:47 -0700

John Machin wrote:
> 1. *By definition*, you can encode *any* Unicode string into utf-8.
> Proves nothing.
> 2. \u00a0 [no-break space] has no equivalent in gb2312, nor in the
> later gbk alias cp936. It does have an equivalent in the latest Chinese
> encoding, gb18030.


Also, *by definition*, though :-) For those that have not followed
encodings too closely: gb18030 is to gb2312 what UTF-8 is to ASCII.
Both encode the entire Unicode in an algorithmic way, and provide
byte-for-byte identical encodings for the for their respective
subset.

Regards,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Encode exception for chinese text

Reply via email to