[issue19619] Blacklist base64, hex, ... codecs from bytes.decode() and str.encode()

Nick Coghlan Fri, 22 Nov 2013 18:18:05 -0800

Nick Coghlan added the comment:

Just noting the exact list of codecs that currently bypass the full codec 
machinery and go direct to the C implementation by normalising the codec name 
(which includes forcing to lowercase) and then using strcmp to check against a 
specific set of known encodings.


In PyUnicode_Decode (and hence bytes.decode and bytearray.decode):

utf-8
utf8
latin-1
latin1
iso-8859-1
iso8859-1
mbcs (Windows only)
ascii
utf-16
utf-32

In PyUnicode_AsEncodedString (and hence str.encode), the list is mostly the 
same, but utf-16 and utf-32 are not accelerated (i.e. they're currently still 
looked up through the codec machinery).

It may be worth opening a separate issue to restore the consistency between the 
lists by adding utf-16 and utf-32 to the fast path for encoding as well.

As far as the wrapping mechanism from issue #17828 itself goes:

- it only triggers if PyEval_CallObject on the encoder or decoder returns NULL
- stateful exceptions (which includes UnicodeEncodeError and 
UnicodeDecodeError) and those with custom __init__ or __new__ implementations 
don't get wrapped
- the actual wrapping process is just the C equivalent of "raise 
type(exc)(new_msg) from exc", plus the initial checks to determine if the 
current exception can be wrapped safely
- it applies to the *general purpose* codec machinery, not just to the text 
model related convenience methods

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19619>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue19619] Blacklist base64, hex, ... codecs from bytes.decode() and str.encode()

Reply via email to