hi experts,
i m new to python, i m writing crawlers to extract data from some
chinese websites, and i run into a encoding problem.
i have a unicode object, which looks like this u'\xd6\xd0\xce\xc4'
which is encoded in "gb2312", but i have no idea of how to convert it
back to utf-8
to re-create t
On Apr 1, 7:22 pm, Chris Rebert wrote:
> 2010/4/1 Mister Yu :
>
> > hi experts,
>
> > i m new to python, i m writing crawlers to extract data from some
> > chinese websites, and i run into a encoding problem.
>
> > i have a unicode object, which looks like t
On Apr 1, 8:13 pm, Chris Rebert wrote:
> On Thu, Apr 1, 2010 at 4:38 AM, Mister Yu wrote:
> > On Apr 1, 7:22 pm, Chris Rebert wrote:
> >> 2010/4/1 Mister Yu :
> >> > hi experts,
>
> >> > i m new to python, i m writing crawlers to extract data from s
On Apr 1, 9:31 pm, Stefan Behnel wrote:
> Mister Yu, 01.04.2010 14:26:
>
> > On Apr 1, 8:13 pm, Chris Rebert wrote:
> >> gb2312_bytes = ''.join([chr(ord(c)) for c in u'\xd6\xd0\xce\xc4'])
> >> unicode_string = gb2312_bytes.decode('gb2312