François Pinard <[EMAIL PROTECTED]> writes: > Hi, people. I hope someone would like to enlighten me. > > For any application handling Unicode internally, I'm usually careful > at properly converting those Unicode strings into 8-bit strings before > writing them out. > > However, this morning, I mistakenly forgot to do so before using one > Unicode string (containing a non-ASCII character) as an argument to > the `print' statement, and I did _not_ get an error. This is rather > surprising to me. I reread the section of the Python reference manual > (version 2.3.4, this machine uses 2.3.3 currently), and it does not say > anything about a special processing for Unicode strings. > > In my understanding, when `print' is given an argument which is not > already a string (I read: 8-bit string), it first gets converted into > a string (I read: calling __str__). But if I call `str()' explicitly, > _then_ I get an error as expected. The question is, why is there no > error if I do not call `str()' explicity? > > For example, given file `question.py' with this contents: > > # -*- coding: UTF-8 -*- > texte = unicode("Fran\xe7ois", 'latin1') > print type(texte), repr(texte), texte > print type(texte), repr(texte), str(texte) > > doing `python question.py' yields: > > <type 'unicode'> u'Fran\xe7ois' François > <type 'unicode'> u'Fran\xe7ois' > Traceback (most recent call last): > File "question.py", line 4, in ? > print type(texte), repr(texte), str(texte) > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' \ > in position 4: ordinal not in range(128) > > (last line wrapped for legibility). > > So (trying to be crystal clear), why is the first `print' working over > its third argument, but not the second? How does `print' convert that > Unicode string to a 8-bit string for output, if not through `str()'? > What is missing to the documentation, or to my way of understanding it?
AFAIK, print uses sys.stdout.encoding to encode the unicode string. Thomas -- http://mail.python.org/mailman/listinfo/python-list