Re: Some questions about decode/encode

Ben Finney Wed, 23 Jan 2008 21:46:15 -0800

Ben Finney <[EMAIL PROTECTED]> writes:

> glacier <[EMAIL PROTECTED]> writes:
> 
> > I use chinese charactors as an example here.
> > 
> > >>>s1='你好吗'
> > >>>repr(s1)
> > "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'"
> > >>>b1=s1.decode('GBK')
> > 
> > My first question is : what strategy does 'decode' use to tell the
> > way to seperate the words. I mean since s1 is an multi-bytes-char
> > string, how did it determine to seperate the string every 2bytes
> > or 1byte?
> 
> The codec you specified ("GBK") is, like any character-encoding
> codec, a precise mapping between characters and bytes. It's almost
> certainly not aware of "words", only character-to-byte mappings.


To be clear, I should point out that I didn't mean to imply static
tabular mappings only. The mappings in a character encoding are often
more complex and algorithmic.

That doesn't make them any less precise, of course; and the core point
is that a character-mapping codec is *only* about getting between
characters and bytes, nothing else.

-- 
 \                 "He who laughs last, thinks slowest."  -- Anonymous |
  `\                                                                   |
_o__)                                                                  |
Ben Finney
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Some questions about decode/encode

Reply via email to