Le jeudi 8 août 2013 18:27:06 UTC+2, Kurt Mueller a écrit : > Now I have this small example: > > ---------------------------------------------------------- > > #!/usr/bin/env python > > # vim: set fileencoding=utf-8 : > > > > from __future__ import print_function > > import sys, shlex > > > > print( repr( sys.stdin.encoding ) ) > > > > strg_form = u'{0:>3} {1:>3} {2:>3} {3:>3} {4:>3}' > > for inpt_line in sys.stdin: > > proc_line = shlex.split( inpt_line, False, True, ) > > encoding = "utf-8" > > proc_line = [ strg.decode( encoding ) for strg in proc_line ] > > print( strg_form.format( *proc_line ) ) > > ---------------------------------------------------------- > > > > $ echo -e "a b c d e\na ö u 1 2" | file - > > /dev/stdin: UTF-8 Unicode text > > $ echo -e "a b c d e\na ö u 1 2" | ./align_compact.py > > None > > a b c d e > > a ö u 1 2 > > $ echo -e "a b c d e\na ö u 1 2" | recode utf8..latin9 | file - > > /dev/stdin: ISO-8859 text > > $ echo -e "a b c d e\na ö u 1 2" | recode utf8..latin9 | ./align_compact.py > > None > > a b c d e > > Traceback (most recent call last): > > File "./align_compact.py", line 13, in <module> > > proc_line = [ strg.decode( encoding ) for strg in proc_line ] > > File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode > > return codecs.utf_8_decode(input, errors, True) > > UnicodeDecodeError: 'utf8' codec can't decode byte 0xf6 in position 0: > invalid start byte > > muk@mcp20:/sw/prog/scripts/text_manip> > > > > How do I handle this two inputs? > > > > > > TIA > > -- > > Kurt Mueller
-------- It's very easy. The error msg indicates, you cann't decode your series of bytes with the utf-8 codec, simply because your string is encoded in iso-8859-* (you did it explicitly!). Your problem is not Python, your problem is the coding of the characters. You should be aware about the coding of the strings you are manipulating (creating) and if necessary decode and/or encode correctly accordingly to what you wish, eg. a suitable coding for the display. That's on this level that Python (or any language) matters. The sys.std*.encoding is a different problem. iso-8859-* ? iso-8859-1 == latin-1 and latin9 == iso-8859-15. If one excepts "das grosse Eszett", both codings are able to handle German (it seems to be your case) and there are no problems when working directly with these codings. jmf -- http://mail.python.org/mailman/listinfo/python-list