On 1月24日, 下午1时49分, [EMAIL PROTECTED] wrote: > On Jan 23, 8:49 pm, glacier <[EMAIL PROTECTED]> wrote: > > > I use chinese charactors as an example here. > > > >>>s1='你好吗' > > >>>repr(s1) > > > "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'" > > > >>>b1=s1.decode('GBK') > > > My first question is : what strategy does 'decode' use to tell the way > > to seperate the words. > > decode() uses the GBK strategy you specified to determine what > constitutes a character in your string. > > > My second question is: is there any one who has tested very long mbcs > > decode? I tried to decode a long(20+MB) xml yesterday, which turns out > > to be very strange and cause SAX fail to parse the decoded string. > > However, I use another text editor to convert the file to utf-8 and > > SAX will parse the content successfully. > > > I'm not sure if some special byte array or too long text caused this > > problem. Or maybe thats a BUG of python 2.5? > > That's probably to vague of a description to determine why SAX isn't > doing what you expect it to.
You mean to post a copy of the XML document? -- http://mail.python.org/mailman/listinfo/python-list