On Jan 23, 8:49 pm, glacier <[EMAIL PROTECTED]> wrote: > I use chinese charactors as an example here. > > >>>s1='你好吗' > >>>repr(s1) > > "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'" > > >>>b1=s1.decode('GBK') > > My first question is : what strategy does 'decode' use to tell the way > to seperate the words.
decode() uses the GBK strategy you specified to determine what constitutes a character in your string. > My second question is: is there any one who has tested very long mbcs > decode? I tried to decode a long(20+MB) xml yesterday, which turns out > to be very strange and cause SAX fail to parse the decoded string. > However, I use another text editor to convert the file to utf-8 and > SAX will parse the content successfully. > > I'm not sure if some special byte array or too long text caused this > problem. Or maybe thats a BUG of python 2.5? That's probably to vague of a description to determine why SAX isn't doing what you expect it to. -- http://mail.python.org/mailman/listinfo/python-list