New submission from STINNER Victor <victor.stin...@haypocalc.com>:

PyUnicode_Decode() and PyUnicode_AsEncodedString() calls directly builtin 
decoders/encoders for some known encodings (eg. "utf-8"), instead of using the 
slow path (call PyCodec_Decode() / PyCodec_Encode()). 

PyUnicode_Decode() does normalize the encoding name: convert to lower and 
replace "_" by "-", as normalizestring() does. But PyUnicode_AsEncodedString() 
doesn't normalize the encoding name, it just use strcmp(). PyUnicode_Decode() 
has a shortcut for ISO-8859-1, whereas PyUnicode_AsEncodedString() doesn't 
(only for "latin-1").

Attached patch creates a subfunction (static) normalize_encoding(), use it in 
PyUnicode_Decode() and PyUnicode_AsEncodedString(), and adds a shortcut for 
ISO-8859-1 to PyUnicode_AsEncodedString().

----------
components: Unicode
files: unicode_shortcuts.patch
keywords: patch
messages: 107203
nosy: haypo, pitrou
priority: normal
severity: normal
status: open
title: Improve encoding shortcuts in PyUnicode_AsEncodedString()
type: performance
versions: Python 3.2
Added file: http://bugs.python.org/file17574/unicode_shortcuts.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8922>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to