[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

STINNER Victor Tue, 14 Jan 2020 13:56:20 -0800

New submission from STINNER Victor <vstin...@python.org>:

bpo-37751 changed codecs.lookup() in a subtle way: non-ASCII characters are now 
ignored, whereas they were copied unmodified previously.


I would prefer that codecs.lookup() and encodings.normalize_encoding() behave 
the same. Either always ignore or always copy.

Moreover, it seems like there is no test on how the encoding names are 
normalized in codecs.register(). I recall that using codecs.register() in an 
unit test causes troubles since there is no API to unregister a search 
function. Maybe we should just add a private function for test in _testcapi.

Serhiy Storchaka wrote an example on my PR:
https://github.com/python/cpython/pull/17997/files

> There are other differences. For example, normalize_encoding("КОИ-8") returns 
> "кои_8", but codecs.lookup normalizes it to "8".

> The comment in the sources is also not correct.

----------
components: Library (Lib)
messages: 360004
nosy: lemburg, serhiy.storchaka, vstinner
priority: normal
severity: normal
status: open
title: codecs.lookup() ignores non-ASCII characters, whereas 
encodings.normalize_encoding() copies them
versions: Python 3.9

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue39337>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

Reply via email to