1. *By definition*, you can encode *any* Unicode string into utf-8. Proves nothing. 2. \u00a0 [no-break space] has no equivalent in gb2312, nor in the later gbk alias cp936. It does have an equivalent in the latest Chinese encoding, gb18030. 3. gb2312 is outdated. It is not really an "appropriate" charset for anything much these days. You need to check out what your requirements really are. The unknowing will cheerfully use "gb" to mean one or more of those, or to mean "anything that's not big5" :-) 4. The slab of text you supplied is genuine unicode and encodes happily into all those gb* charsets. It does *not* contain \u00a0.
I do hope some of this helps .... Cheers, John
-- http://mail.python.org/mailman/listinfo/python-list