On 1月24日, 下午3时29分, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > En Thu, 24 Jan 2008 04:52:22 -0200, glacier <[EMAIL PROTECTED]> escribió: > > > According to your reply, what will happen if I try to decode a long > > string seperately. > > I mean: > > ###################################### > > a='你好吗'*100000 > > s1 = u'' > > cur = 0 > > while cur < len(a): > > d = min(len(a)-i,1023) > > s1 += a[cur:cur+d].decode('mbcs') > > cur += d > > ###################################### > > > May the code above produce any bogus characters in s1? > > Don't do that. You might be splitting the input string at a point that is > not a character boundary. You won't get bogus output, decode will raise a > UnicodeDecodeError instead. > You can control how errors are handled, see > http://docs.python.org/lib/string-methods.html#l2h-237 > > -- > Gabriel Genellina
Thanks Gabriel, I guess I understand what will happen if I didn't split the string at the character's boundry. I'm not sure if the decode method will miss split the boundry. Can you tell me then ? Thanks a lot. -- http://mail.python.org/mailman/listinfo/python-list