Re: unicode by default

harrismh777 Wed, 11 May 2011 18:28:14 -0700

John Machin wrote:

(1) You cannot work without using bytes sequences. Files are byte
sequences. Web communication is in bytes. You need to (know / assume / be
able to extract / guess) the input encoding. You need to encode your
output using an encoding that is expected by the consumer (or use an
output method that will do it for you).


(2) You don't need to use bytes to specify a Unicode code point. Just use
an escape sequence e.g. "\u0404" is a Cyrillic character.

Thanks John. In reverse order, I understand point (2). I'm less clearon point (1).

If I generate a string of characters that I presume to be ascii/utf-8(no \u0404 type characters) and write them to a file (stdout) how doesdefault encoding affect that file.by default..? I'm not seeing thatthere is anything unusual going on... If I open the file with vi? IfI open the file with gedit? emacs?


....

Another question... in mail I'm receiving many small blocks that looklike sprites with four small hex codes, scattered about the mail...mostly punctuation, maybe? ... guessing, are these unicode codepoints, and if so what is the best way to 'guess' the encoding? ... isit coded in the stream somewhere...protocol?


thanks
--
http://mail.python.org/mailman/listinfo/python-list

Re: unicode by default

Reply via email to