Re: Encode exception for chinese text

2006-05-19 Thread John Machin
MvL wrote: > Also, *by definition*, though :-) Ah yes, indeed; and thanks for reminding me. Aside: Similar definition, but not similar design: IMHO utf-8 sits on top of ASCII like a rose on a stalk, whereas gb18030 sits on top of gb2312 like a rhinoceros on a unicycle :-) Cheers, John -- http://

Re: Encode exception for chinese text

2006-05-19 Thread Martin v. Löwis
John Machin wrote: > 1. *By definition*, you can encode *any* Unicode string into utf-8. > Proves nothing. > 2. \u00a0 [no-break space] has no equivalent in gb2312, nor in the > later gbk alias cp936. It does have an equivalent in the latest Chinese > encoding, gb18030. Also, *by definition*, thou

Re: Encode exception for chinese text

2006-05-19 Thread Vinayakc
Hey Serge, john, Thank you very much. I was really not aware of these facts. Anyways this is happening only for one in millions so I can ignore this for now. Thanks again, Vinayakc -- http://mail.python.org/mailman/listinfo/python-list

Re: Encode exception for chinese text

2006-05-19 Thread Serge Orlov
Vinayakc wrote: > Yes serge, I have removed the first character but it is still giving > encoding exception. Then I guess this character was used as a poor man indentation tool at least in the beginning of your text. It's up to you to decide what to do with that character, you have several choices

Re: Encode exception for chinese text

2006-05-19 Thread John Machin
1. *By definition*, you can encode *any* Unicode string into utf-8. Proves nothing. 2. \u00a0 [no-break space] has no equivalent in gb2312, nor in the later gbk alias cp936. It does have an equivalent in the latest Chinese encoding, gb18030. 3. gb2312 is outdated. It is not really an "appropriate"

Re: Encode exception for chinese text

2006-05-19 Thread Vinayakc
Yes serge, I have removed the first character but it is still giving encoding exception. -- http://mail.python.org/mailman/listinfo/python-list

Re: Encode exception for chinese text

2006-05-19 Thread Serge Orlov
Vinayakc wrote: > Hi all, > > I am new to python. > > I have written one small application which reads data from xml file and > tries to encode data using apprpriate charset. > I am facing problem while encoding one chinese paragraph with charset > "gb2312". > > code is: > > encoded_str = str_data.

Re: Encode exception for chinese text

2006-05-19 Thread swordsp
Are you sure all the characters in original text are in "gb2312" charset? Encoding with "utf8" seems work for this character (u'\xa0'), but I don't know if the result is correct. Could you give a subset of str_data in unicode? -- http://mail.python.org/mailman/listinfo/python-list

Encode exception for chinese text

2006-05-19 Thread Vinayakc
Hi all, I am new to python. I have written one small application which reads data from xml file and tries to encode data using apprpriate charset. I am facing problem while encoding one chinese paragraph with charset "gb2312". code is: encoded_str = str_data.encode("gb2312") The type of str_da