New submission from STINNER Victor <vstin...@python.org>: While working on bpo-46659, I found a bug in the encodings "mbcs" alias. Even if the function has 2 tests (in test_codecs and test_site), both tests missed the bug :-(
I fixed the alias with this change: --- commit 04dd60e50cd3da48fd19cdab4c0e4cc600d6af30 Author: Victor Stinner <vstin...@python.org> Date: Sun Feb 6 21:50:09 2022 +0100 bpo-46659: Update the test on the mbcs codec alias (GH-31168) encodings registers the _alias_mbcs() codec search function before the search_function() codec search function. Previously, the _alias_mbcs() was never used. Fix the test_codecs.test_mbcs_alias() test: use the current ANSI code page, not a fake ANSI code page number. Remove the test_site.test_aliasing_mbcs() test: the alias is now implemented in the encodings module, no longer in the site module. --- But Eryk found two bugs: """ This was never true before. With 1252 as my ANSI code page, I checked codecs.lookup('cp1252') in 2.7, 3.4, 3.5, 3.6, 3.9, and 3.10, and none of them return the "mbcs" encoding. It's not equivalent, and not supposed to be. The implementation of "cp1252" should be cross-platform, regardless of whether we're on a Windows system with 1252 as the ANSI code page, as opposed to a Windows system with some other ANSI code page, or a Linux or macOS system. The differences are that "mbcs" maps every byte, whereas our code-page encodings do not map undefined bytes, and the "replace" handler of "mbcs" uses a best-fit mapping (e.g. "α" -> "a") when encoding text, instead of mapping all undefined characters to "?". """ and my new test fails if PYTHONUTF8=1 env var is set: """ This will fail if PYTHONUTF8 is set in the environment, because it overrides getpreferredencoding(False) and _get_locale_encoding(). """ The code for the "mbcs" alias changed at lot between Python 3.5 and 3.7. In Python 3.5, site module: --- def aliasmbcs(): """On Windows, some default encodings are not provided by Python, while they are always available as "mbcs" in each locale. Make them usable by aliasing to "mbcs" in such a case.""" if sys.platform == 'win32': import _bootlocale, codecs enc = _bootlocale.getpreferredencoding(False) if enc.startswith('cp'): # "cp***" ? try: codecs.lookup(enc) except LookupError: import encodings encodings._cache[enc] = encodings._unknown encodings.aliases.aliases[enc] = 'mbcs' --- In Python 3.6, encodings module: --- (...) codecs.register(search_function) if sys.platform == 'win32': def _alias_mbcs(encoding): try: import _bootlocale if encoding == _bootlocale.getpreferredencoding(False): import encodings.mbcs return encodings.mbcs.getregentry() except ImportError: # Imports may fail while we are shutting down pass codecs.register(_alias_mbcs) --- Python 3.7, encodings module: --- (...) codecs.register(search_function) if sys.platform == 'win32': def _alias_mbcs(encoding): try: import _winapi ansi_code_page = "cp%s" % _winapi.GetACP() if encoding == ansi_code_page: import encodings.mbcs return encodings.mbcs.getregentry() except ImportError: # Imports may fail while we are shutting down pass codecs.register(_alias_mbcs) --- The Python 3.6 and 3.7 "codecs.register(_alias_mbcs)" doesn't work because "search_function()" is tested before and it works for "cpXXX" encodings. My changes changes the order in which codecs search functions are registered: first the MBCS alias, then the encodings search_function(). In Python 3.5, the alias was only created if Python didn't support the code page. ---------- components: Library (Lib) messages: 412678 nosy: vstinner priority: normal severity: normal status: open title: encodings: the "mbcs" alias doesn't work versions: Python 3.11 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue46668> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com