Rehceb Rotkiv wrote: > #!/usr/bin/python > import sys > import codecs > fileHandle = codecs.open(sys.argv[1], 'r', 'utf-8') > fileString = fileHandle.read() > print fileString > > if I call it from a Bash shell like this > > $ ./test.py testfile.utf8.txt > > it works just fine, but when I try to pipe the output to another process > ("|") or into a file (">"), e.g. like this > > $ ./test.py testfile.utf8.txt | cat > > I get an error: > > Traceback (most recent call last): > File "./test.py", line 6, in ? > print fileString > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in > position 538: ordinal not in range(128) > > I absolutely don't know what's the problem here, can you help?
Using codecs.open, when you read the file you get Unicode. When you print the Unicode object, it is encoded using your terminal default encoding (utf8 I presume?) But when you redirect the output, it's no more connected to your terminal so no encoding can be assumed, and the default encoding is used. Try this line at the top: print "stdout:",sys.stdout.encoding,"default:",sys.getdefaultencoding() I get stdout: ANSI_X3.4-1968 default: ascii normally and stdout: None default: ascii when redirected. You have to encode the Unicode object explicitely: print fileString.encode("utf-8") (or any other suitable one; I said utf-8 just because you read the input file using that) -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list