Vinayakc wrote:
> Hi all,
>
> I am new to python.
>
> I have written one small application which reads data from xml file and
> tries to encode data using apprpriate charset.
> I am facing problem while encoding one chinese paragraph with charset
> "gb2312".
>
> code is:
>
> encoded_str = str_data.encode("gb2312")
>
> The type of str_data is <type 'unicode'>
>
> The exception is:
>
> "UnicodeEncodeError: 'gb2312' codec can't encode character u'\xa0' in
> position 0: illegal multibyte sequence"Hmm, this is 'no-break space' in the very beginning of the text. It look suspiciously like a plain text utf-8 signature which is 'zero width no-break space'. If you strip the first character do you still have encoding errors? -- http://mail.python.org/mailman/listinfo/python-list
