New submission from STINNER Victor <victor.stin...@haypocalc.com>: To decode byte string from the locale encoding (LC_CTYPE), PyUnicode_DecodeFSDefault() can be used, but this function uses a constant encoding set at startup (the locale encoding at startup). The right method is currently to call _Py_char2wchar() and then PyUnicode_FromWideChar(). _Py_char2wchar() is a low level function, it doesn't raise nice Python exception, but just return NULL on error and write a message to stderr using fprintf() (!).
Attached patch adds PyUnicode_DecodeLocale() and PyUnicode_DecodeLocaleAndSize() to offer a high level API to decode data from the *current* locale encoding. These functions fail with an OSError or MemoryError if decoding fails (instead of a generic ValueError), and don't write to stderr anymore. They are a surrogateescape argument to choose to escape undecodable bytes or to fail with an error. The patch only uses the function in _localemodule.c, but other functions may have to be fixed to use the new function. The tzname_encoding.patch of issue #5905 should maybe use it for example. ---------- components: Unicode messages: 149060 nosy: ezio.melotti, haypo, loewis priority: normal severity: normal status: open title: Add PyUnicode_DecodeLocale and PyUnicode_DecodeLocaleAndSize versions: Python 3.3 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue13560> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com