MvL wrote:
> Also, *by definition*, though :-)
Ah yes, indeed; and thanks for reminding me. Aside: Similar definition,
but not similar design: IMHO utf-8 sits on top of ASCII like a rose on
a stalk, whereas gb18030 sits on top of gb2312 like a rhinoceros on a
unicycle :-)
Cheers,
John
--
http://
John Machin wrote:
> 1. *By definition*, you can encode *any* Unicode string into utf-8.
> Proves nothing.
> 2. \u00a0 [no-break space] has no equivalent in gb2312, nor in the
> later gbk alias cp936. It does have an equivalent in the latest Chinese
> encoding, gb18030.
Also, *by definition*, thou
Hey Serge, john,
Thank you very much. I was really not aware of these facts. Anyways
this is happening only for one in millions so I can ignore this for
now.
Thanks again,
Vinayakc
--
http://mail.python.org/mailman/listinfo/python-list
Vinayakc wrote:
> Yes serge, I have removed the first character but it is still giving
> encoding exception.
Then I guess this character was used as a poor man indentation tool at
least in the beginning of your text. It's up to you to decide what to
do with that character, you have several choices
1. *By definition*, you can encode *any* Unicode string into utf-8.
Proves nothing.
2. \u00a0 [no-break space] has no equivalent in gb2312, nor in the
later gbk alias cp936. It does have an equivalent in the latest Chinese
encoding, gb18030.
3. gb2312 is outdated. It is not really an "appropriate"
Yes serge, I have removed the first character but it is still giving
encoding exception.
--
http://mail.python.org/mailman/listinfo/python-list
Vinayakc wrote:
> Hi all,
>
> I am new to python.
>
> I have written one small application which reads data from xml file and
> tries to encode data using apprpriate charset.
> I am facing problem while encoding one chinese paragraph with charset
> "gb2312".
>
> code is:
>
> encoded_str = str_data.
Are you sure all the characters in original text are in "gb2312"
charset?
Encoding with "utf8" seems work for this character (u'\xa0'), but I
don't know if the result is correct.
Could you give a subset of str_data in unicode?
--
http://mail.python.org/mailman/listinfo/python-list
Hi all,
I am new to python.
I have written one small application which reads data from xml file and
tries to encode data using apprpriate charset.
I am facing problem while encoding one chinese paragraph with charset
"gb2312".
code is:
encoded_str = str_data.encode("gb2312")
The type of str_da