Nick Coghlan added the comment: Just noting the exact list of codecs that currently bypass the full codec machinery and go direct to the C implementation by normalising the codec name (which includes forcing to lowercase) and then using strcmp to check against a specific set of known encodings.
In PyUnicode_Decode (and hence bytes.decode and bytearray.decode): utf-8 utf8 latin-1 latin1 iso-8859-1 iso8859-1 mbcs (Windows only) ascii utf-16 utf-32 In PyUnicode_AsEncodedString (and hence str.encode), the list is mostly the same, but utf-16 and utf-32 are not accelerated (i.e. they're currently still looked up through the codec machinery). It may be worth opening a separate issue to restore the consistency between the lists by adding utf-16 and utf-32 to the fast path for encoding as well. As far as the wrapping mechanism from issue #17828 itself goes: - it only triggers if PyEval_CallObject on the encoder or decoder returns NULL - stateful exceptions (which includes UnicodeEncodeError and UnicodeDecodeError) and those with custom __init__ or __new__ implementations don't get wrapped - the actual wrapping process is just the C equivalent of "raise type(exc)(new_msg) from exc", plus the initial checks to determine if the current exception can be wrapped safely - it applies to the *general purpose* codec machinery, not just to the text model related convenience methods ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19619> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com