Re: Some questions about decode/encode

bbtestingbb Wed, 23 Jan 2008 21:51:40 -0800

On Jan 23, 8:49 pm, glacier <[EMAIL PROTECTED]> wrote:
> I use chinese charactors as an example here.
>
> >>>s1='你好吗'
> >>>repr(s1)
>
> "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'"
>
> >>>b1=s1.decode('GBK')
>
> My first question is : what strategy does 'decode' use to tell the way
> to seperate the words.


decode() uses the GBK strategy you specified to determine what
constitutes a character in your string.

> My second question is: is there any one who has tested very long mbcs
> decode? I tried to decode a long(20+MB) xml yesterday, which turns out
> to be very strange and cause SAX fail to parse the decoded string.
> However, I use another text editor to convert the file to utf-8 and
> SAX will parse the content successfully.
>
> I'm not sure if some special byte array or too long text caused this
> problem. Or maybe thats a BUG of python 2.5?

That's probably to vague of a description to determine why SAX isn't
doing what you expect it to.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Some questions about decode/encode

Reply via email to