New submission from STINNER Victor <victor.stin...@haypocalc.com>: PyUnicode_Decode() and PyUnicode_AsEncodedString() calls directly builtin decoders/encoders for some known encodings (eg. "utf-8"), instead of using the slow path (call PyCodec_Decode() / PyCodec_Encode()).
PyUnicode_Decode() does normalize the encoding name: convert to lower and replace "_" by "-", as normalizestring() does. But PyUnicode_AsEncodedString() doesn't normalize the encoding name, it just use strcmp(). PyUnicode_Decode() has a shortcut for ISO-8859-1, whereas PyUnicode_AsEncodedString() doesn't (only for "latin-1"). Attached patch creates a subfunction (static) normalize_encoding(), use it in PyUnicode_Decode() and PyUnicode_AsEncodedString(), and adds a shortcut for ISO-8859-1 to PyUnicode_AsEncodedString(). ---------- components: Unicode files: unicode_shortcuts.patch keywords: patch messages: 107203 nosy: haypo, pitrou priority: normal severity: normal status: open title: Improve encoding shortcuts in PyUnicode_AsEncodedString() type: performance versions: Python 3.2 Added file: http://bugs.python.org/file17574/unicode_shortcuts.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8922> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com