Marc-Andre Lemburg <m...@egenix.com> added the comment: Alexander Belopolsky wrote: > > Alexander Belopolsky <belopol...@users.sourceforge.net> added the comment: > > Ezio and I discussed on IRC the implementation of alias lookup and neither of > us was able to point out to the function that strips non-alphanumeric > characters from encoding names.
I think you are misunderstanding the way the codec registry works. You register codec search functions with it which then have to try to map a given encoding name to a codec module. The stdlib ships with one such function (defined in encodings/__init__.py). This is registered with the codec registry per default. The codec search function takes care of any normalization and conversion to the module name used by the codecs from that codec package. > It turns out that there are three "normalize" functions that are successively > applied to the encoding name during evaluation of str.encode/str.decode. > > 1. normalize_encoding() in unicodeobject.c This was added to have the few shortcuts we have in the C code for commonly used codecs match more encoding aliases. The shortcuts completely bypass the codec registry and also bypass the function call overhead incurred by codecs run via the codec registry. > 2. normalizestring() in codecs.c This is the normalization applied by the codec registry. See PEP 100 for details: """ Search functions are expected to take one argument, the encoding name in all lower case letters and with hyphens and spaces converted to underscores, ... """ > 3. normalize_encoding() in encodings/__init__.py This is part of the stdlib encodings package's codec search function. > Each performs a slightly different transformation and only the last one > strips non-alphanumeric characters. > > The complexity of codec lookup is comparable with that of the import > mechanism! It's flexible, but not really complex. I hope the above clarifies the reasons for the three normalization functions. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue5902> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com