glacier <[EMAIL PROTECTED]> writes: > I use chinese charactors as an example here. > > >>>s1='你好吗' > >>>repr(s1) > "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'" > >>>b1=s1.decode('GBK') > > My first question is : what strategy does 'decode' use to tell the way > to seperate the words. I mean since s1 is an multi-bytes-char string, > how did it determine to seperate the string every 2bytes or 1byte?
The codec you specified ("GBK") is, like any character-encoding codec, a precise mapping between characters and bytes. It's almost certainly not aware of "words", only character-to-byte mappings. -- \ "When I get new information, I change my position. What, sir, | `\ do you do with new information?" -- John Maynard Keynes | _o__) | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list