I've used the code below successfully to deal with such a problem when outputting filenames. Python2x3 is at http://stromberg.dnsalias.org/svn/python2x3/ , but here it's just being used to convert Python 3.x's byte strings to strings (to eliminate the b'' stuff), while on 2.x it's an identity function - if you're targeting 3.x alone, there's no need to take a dependency on python2x3.
If you really do need to output such characters, rather than replacing them with ?'s, you could use os.write() to filedescriptor 1 - that works in both 2.x and 3.x. def ascii_ize(binary): '''Replace non-ASCII characters with question marks; otherwise writing to sys.stdout tracebacks''' list_ = [] question_mark_ordinal = ord('?') for ordinal in python2x3.binary_to_intlist(binary): if 0 <= ordinal <= 127: list_.append(ordinal) else: list_.append(question_mark_ordinal) return python2x3.intlist_to_binary(list_) def output_filename(filename, add_eol=True): '''Output a filename to the tty (stdout), taking into account that some tty's do not allow non-ASCII characters''' if sys.stdout.encoding == 'US-ASCII': converted = python2x3.binary_to_string(ascii_ize(filename)) else: converted = python2x3.binary_to_string(filename) replaced = converted.replace('\n', '?').replace('\r', '?').replace('\t', '?') sys.stdout.write(replaced) if add_eol: sys.stdout.write('\n') On Fri, Jul 15, 2011 at 5:02 PM, Pedro Abranches <pedrof.abranc...@gmail.com > wrote: > Hello everyone. > > I'm having a problem when outputing UTF-8 strings to a console. > Let me show a simple example that explains it: > > $ python -c 'import sys; print sys.stdout.encoding; print u"\xe9"' > UTF-8 > é > > It's everything ok. > Now, if you're using your python script in some shell script you might have > to store the output in some variable, like this: > > $ var=`python -c 'import sys; print sys.stdout.encoding; print u"\xe9"'` > > And what you get is: > > Traceback (most recent call last): > File "<string>", line 1, in <module> > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in > position 0: ordinal not in range(128) > > So, python is not being able to detect the encoding of the output in a > situation like that, in which the python script is called not directly but > around ``. > > Why does happen? Is there a way to solve it either by python or by shell > code? > > Thanks, > Pedro Abranches > > -- > http://mail.python.org/mailman/listinfo/python-list > >
-- http://mail.python.org/mailman/listinfo/python-list