On Apr 1, 9:31 pm, Stefan Behnel wrote:
> Mister Yu, 01.04.2010 14:26:
>
> > On Apr 1, 8:13 pm, Chris Rebert wrote:
> >> gb2312_bytes = ''.join([chr(ord(c)) for c in u'\xd6\xd0\xce\xc4'])
> >> unicode_string = gb2312_bytes.decode('gb2312')
> >> utf8_bytes = unicode_string.encode('utf-8') #as you w
Mister Yu, 01.04.2010 14:26:
On Apr 1, 8:13 pm, Chris Rebert wrote:
gb2312_bytes = ''.join([chr(ord(c)) for c in u'\xd6\xd0\xce\xc4'])
unicode_string = gb2312_bytes.decode('gb2312')
utf8_bytes = unicode_string.encode('utf-8') #as you wanted
Simplifying this hack a bit:
gb2312_bytes = u'\x
On Apr 1, 8:13 pm, Chris Rebert wrote:
> On Thu, Apr 1, 2010 at 4:38 AM, Mister Yu wrote:
> > On Apr 1, 7:22 pm, Chris Rebert wrote:
> >> 2010/4/1 Mister Yu :
> >> > hi experts,
>
> >> > i m new to python, i m writing crawlers to extract data from some
> >> > chinese websites, and i run into a e
Mister Yu, 01.04.2010 13:38:
i m still not very sure how to convert a unicode object **
u'\xd6\xd0\xce\xc4 ** back to "中文" the string it supposed to be?
You are confused. '\xd6\xd0\xce\xc4' is an encoded byte string, not a
unicode string. The fact that you have it stored in a unicode string
On Thu, Apr 1, 2010 at 4:38 AM, Mister Yu wrote:
> On Apr 1, 7:22 pm, Chris Rebert wrote:
>> 2010/4/1 Mister Yu :
>> > hi experts,
>>
>> > i m new to python, i m writing crawlers to extract data from some
>> > chinese websites, and i run into a encoding problem.
>>
>> > i have a unicode object, w
On Apr 1, 7:22 pm, Chris Rebert wrote:
> 2010/4/1 Mister Yu :
>
> > hi experts,
>
> > i m new to python, i m writing crawlers to extract data from some
> > chinese websites, and i run into a encoding problem.
>
> > i have a unicode object, which looks like this u'\xd6\xd0\xce\xc4'
> > which is enc
2010/4/1 Mister Yu :
> hi experts,
>
> i m new to python, i m writing crawlers to extract data from some
> chinese websites, and i run into a encoding problem.
>
> i have a unicode object, which looks like this u'\xd6\xd0\xce\xc4'
> which is encoded in "gb2312",
No! Instances of type 'unicode' (i.