Marc-Andre Lemburg <m...@egenix.com> added the comment:

Alexander Belopolsky wrote:
> 
> Alexander Belopolsky <belopol...@users.sourceforge.net> added the comment:
> 
> Ezio and I discussed on IRC the implementation of alias lookup and neither of 
> us was able to point out to the function that strips non-alphanumeric 
> characters from encoding names.

I think you are misunderstanding the way the codec registry works.

You register codec search functions with it which then have to try
to map a given encoding name to a codec module.

The stdlib ships with one such function (defined in encodings/__init__.py).
This is registered with the codec registry per default.

The codec search function takes care of any normalization and conversion
to the module name used by the codecs from that codec package.

> It turns out that there are three "normalize" functions that are successively 
> applied to the encoding name during evaluation of str.encode/str.decode.
> 
> 1. normalize_encoding() in unicodeobject.c

This was added to have the few shortcuts we have in the C code
for commonly used codecs match more encoding aliases.

The shortcuts completely bypass the codec registry and also
bypass the function call overhead incurred by codecs
run via the codec registry.

> 2. normalizestring() in codecs.c

This is the normalization applied by the codec registry. See PEP 100
for details:

"""
    Search functions are expected to take one argument, the encoding
    name in all lower case letters and with hyphens and spaces
    converted to underscores, ...
"""

> 3. normalize_encoding() in encodings/__init__.py

This is part of the stdlib encodings package's codec search
function.

> Each performs a slightly different transformation and only the last one 
> strips non-alphanumeric characters.
> 
> The complexity of codec lookup is comparable with that of the import 
> mechanism!

It's flexible, but not really complex.

I hope the above clarifies the reasons for the three normalization
functions.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue5902>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to