John Machin wrote:
(1) You cannot work without using bytes sequences. Files are byte
sequences. Web communication is in bytes. You need to (know / assume / be
able to extract / guess) the input encoding. You need to encode your
output using an encoding that is expected by the consumer (or use an
output method that will do it for you).

(2) You don't need to use bytes to specify a Unicode code point. Just use
an escape sequence e.g. "\u0404" is a Cyrillic character.


Thanks John. In reverse order, I understand point (2). I'm less clear on point (1).

If I generate a string of characters that I presume to be ascii/utf-8 (no \u0404 type characters) and write them to a file (stdout) how does default encoding affect that file.by default..? I'm not seeing that there is anything unusual going on... If I open the file with vi? If I open the file with gedit? emacs?

....

Another question... in mail I'm receiving many small blocks that look like sprites with four small hex codes, scattered about the mail... mostly punctuation, maybe? ... guessing, are these unicode code points, and if so what is the best way to 'guess' the encoding? ... is it coded in the stream somewhere...protocol?

thanks
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to