On 10/21/19, Albert-Jan Roskam <sjeik_ap...@hotmail.com> wrote: > On 18 Oct 2019 20:36, Chris Angelico <ros...@gmail.com> wrote: > >> That's correct. The output of the command is, by default, given to you >> in bytes. > > Do you happen to know why this is the default? And is there a reliable way > to figure out the encoding? On posix, it's probably utf8, but on windows I > usually use cp437, but knowing windows, it could be any codepage
In Python 3.6+ on Windows, use "oem" instead of assuming OEM is codepage 437. In Western Europe, OEM is 850, and in Windows 10 it can even be set to 65001 (i.e. UTF-8). Python also supports "ansi" ("mbcs"). These two are implemented via codecs.code_page_encode and codecs.code_page_decode, so, for better or worse, they use the Windows best-fit 'replace' error handling instead of just "?". For example: >>> c = '\N{GREEK SMALL LETTER BETA}' >>> c_oem = c.encode('oem', 'replace').decode('oem') >>> c_oem 'ß' >>> unicodedata.name(c_oem) 'LATIN SMALL LETTER SHARP S' I'd like to also have something like "conin" and "conout" encodings that use the attached console's current input and output codepages. But at least it's simple to a write a little ctypes-based function that implements this. When writing to a pipe, almost all Windows command-line programs default to one of OEM, ANSI, the current console input or output codepage, UTF-8, or UTF-16. The latter two may also write a UTF byte order mark (BOM). Sometimes the output encoding can be configured via command-line options or environment variables (e.g. ipconfig.exe supports an "OutputEncoding" environment variable). > (you can even change it with chcp.exe) It's actually "chcp.com". Thus subprocess.Popen('chcp') fails because CreateProcessW only adds ".EXE" when looking for the executable. This binary uses the ".com" extension for compatibility with legacy batch scripts. But don't let the extension fool you. It's just a regular Windows PE binary, not a 16-bit MS-DOS binary. As mentioned above, some programs use either the console input or output codepage when writing to a pipe. This does not include Windows Python, however, which instead defaults to ANSI. This can be overridden via environment variables and command-line options that set the standard I/O encoding or force UTF-8 mode. -- https://mail.python.org/mailman/listinfo/python-list