[issue8922] Improve encoding shortcuts in PyUnicode_AsEncodedString()

Marc-Andre Lemburg Mon, 07 Jun 2010 14:30:57 -0700

Marc-Andre Lemburg <[email protected]> added the comment:

STINNER Victor wrote:
> 
> New submission from STINNER Victor <[email protected]>:
> 
> PyUnicode_Decode() and PyUnicode_AsEncodedString() calls directly builtin 
> decoders/encoders for some known encodings (eg. "utf-8"), instead of using 
> the slow path (call PyCodec_Decode() / PyCodec_Encode()). 
> 
> PyUnicode_Decode() does normalize the encoding name: convert to lower and 
> replace "_" by "-", as normalizestring() does. But 
> PyUnicode_AsEncodedString() doesn't normalize the encoding name, it just use 
> strcmp(). PyUnicode_Decode() has a shortcut for ISO-8859-1, whereas 
> PyUnicode_AsEncodedString() doesn't (only for "latin-1").
> 
> Attached patch creates a subfunction (static) normalize_encoding(), use it in 
> PyUnicode_Decode() and PyUnicode_AsEncodedString(), and adds a shortcut for 
> ISO-8859-1 to PyUnicode_AsEncodedString().


The normalization in PyUnicode_Decode() must have been added to
Python3 only. It is not present in Python2.

I'm not sure whether it's a good idea to extend this further:
the shortcuts were meant for Python internal use only. Python
itself and it's stdlib should only use the shortcut names
for the resp. special encodings and no variants.

Dealing with variants and normalization is left to the encodings
package and its alias machinery.

Since the Python stdlib and the core already mostly use
the shortcut names, adding normalization won't buy us much.

Note that your change has also made it impossible for the
compiler to do loop unrolling - there's not upper limit
on the size of lower anymore.

In terms of coding style, "static" should go on a separate line.

----------
nosy: +lemburg

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue8922>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8922] Improve encoding shortcuts in PyUnicode_AsEncodedString()

Reply via email to