(windows or linux console) >>> print u'\u034a' Traceback (most recent call last): File "<stdin>", line 1, in ? File "C:\PYTHON23\lib\encodings\cp850.py", line 18, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\u034a' in position 0: character maps to <undefined> >>>
How to get a replacement behaviour into Python's print statement generally ? Fumble on sys.stdout/stderr? sys.stdout.write(u) puts at least random chars. Thus print seems to do it itself and obviously gets sys.stdout.encoding and encodes 'strict'. Where is a good and portable chance for hooking? E.g. for doing it similar as .encode(xy,'replace') or 'backslashreplace'? Shouldn't 'replace' be the default behaviour for (tty-)output !? Background: my file handling script fails on consoles not supporting all filenamechars. I want my apps to auto-run on each platform as smooth, smart and tolerant as possible without fumbling on hundreds and thousands of print/output statements. (input is an extra issue of course) 2nd Problem with PythonWin output functions: PythonWin/win32 functions (which obviously do not support wide unicode auto or by xxxW functions) obviously use the python default encoding, but try a defaultlocale before (defaultlocale, then 'ascii'/site.encoding then error exception by occasion!). This can only be made tolerant on alien chars by hacking site.py/sitecustomize.py/encoding (very sad about this on each python installation). Or is there a Pythonwin function to set the encoding? sys.setdefaultencoding is completely destroyed - not even preserved as sys._setdefaultencoding or so. (to 'mbcs' - not defaultlocale (cp1252 on my machine), because only mbcs is tolerant on foreign chars and converts them to '?' ) The PythonWin scintilla-editor/interactive (obviously) is better: it obviously uses 'mbcs' always. I now decided to put 'mbcs' in site.py for Windows. Isn't that by far the best and acceptable default solution. 'utf-8' in site.py would be acceptable to get some idea about alien chars, but will Thus on my Python/Pythonwin Windows default installation 4 encodings are in action simultaneously !!!! : * 'ascii' in site.py / str() * 'mbcs' in PythonWin interactive/editor * 'cp1252'+'ascii' in PythonWin/win32 Output functions * 'cp850' at console output .. and all output is intolerant on alien chars ! (except 'mbcs' on the primary _test_ field PythonWin Interactive only!! :-( ) Isn't that designed by the Python creators to drive developers crazy? Now by setting site.py/encoding to 'mbcs' (or 'utf-8') the problems in PythonWin are solved slightly. But so far I have no idea, how to have mbcs-output if chars existing and utf-8 or backslashreplace if non-existing. Also: Is wide unicode output possible somehow with PythonWin - at least in certain cases? by WM_SETTEXT ,...SETITEM ... tricks? On Linux there is some improvement after setting site.py/encoding='utf-8'. Still the locale sensitive encoding on tty's should be tolerant/replace-mode by default. Robert PS: this guy also is somewhat angry about the current situation: http://blog.ianbicking.org/do-i-hate-unicode-or-do-i-hate-ascii.html GvR felt save with 'ascii' for "future improvements" like utf-8 : http://mail.python.org/pipermail/python-dev/2002-March/020962.html My suggestions: * Win/Linux: guess at least 'mbcs' on Win and 'utf-8' on Linux for site.encoding are by far worth to do the improvement step. Or provide a prominent function (not fragile sitexxxx.py interface) to change. The current solution it is very unportable und requires very long time to understand for new programmers) And/Or: making tty-print somehow tolerant/char-replacing. * PythonWin: always use 'mbcs' als default-encoding in win32-functions (mbcs_encode is tolerant/replacing in itself). or make the encoding tolerant/char-replacing. And: Add xxxW-Functions or even automatic unicode switching for the major output functions (SetWindowText, SetItem, DrawText, ....) -- http://mail.python.org/mailman/listinfo/python-list