Dāvis added the comment:

Of course I agree that proper solution is to use Unicode/Wide API but that's 
much more work to implement and I rather have now this half-fix which works for 
most of cases than nothing till whenever Unicode support is implemented which 
might be nowhere soon.


> IMO, it makes more sense for programs to use UTF-8, or even UTF-16. Codepages 
> are a legacy that we need to move beyond. Internally the console uses 
> UTF-16LE. 

yes that's true, but we can't do anything about current existing programs and 
so if we default to UTF-8 it will be even worse than defaulting to ANSI because 
there aren't many programs on Windows which would use UTF-8, in fact it's quite 
rare because there's not even good UTF-8 support for console itself like you 
mentioned. Also here I'm talking only about ANSI WinAPI programs with 
console/pipe encoding and not internal or file encoding which here we don't 
really care about.


> Note that patch 3 requires setting `encoding` for even python.exe as a child 
> process, because sys.std* default to ANSI when isatty(fd) isn't true.

I think Python is a bit broken here and IMO it should also use console's 
encoding not ANSI when outputting to console pipe and use ANSI if it really is 
a file.


on Windows 10 with Python 3.5.1

    >chcp
    Active code page: 775
    >python -c "print('ā')"
    ā

    >python -c "print('ā')" | echo
    ECHO is on.
    Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' 
encoding='cp1257'>
    OSError: [Errno 22] Invalid argument

    >chcp 1257
    Active code page: 1257
    >python -c "print('ā')" | echo
    ECHO is on.
    Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' 
encoding='cp1257'>
    OSError: [Errno 22] Invalid argument


in PowerShell

    >[Console]::OutputEncoding.CodePage
    775
    >python -c "print('ā')" | Out-String
    Ō
    >[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
    >python -c "print('ā')" | Out-String
    �
    >[Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding(1257)
    >python -c "print('ā')" | Out-String
    ā


> I proposed using the "/u" switch for shell=True only to facilitate getting 
> results back from cmd's internal commands such as `set`. But it doesn't 
> change anything if you're using the shell to run other programs.

but you can only do that if you know that command you execute is cmd's command 
but if it's user passed command then there isn't really reliable way to detect 
if it will execute inside cmd or not, for example "cmd /u /c chcp.exe" will 
return result in UTF-16 because such program doesn't exist and cmd's error 
message will be outputted. Also if user have set.exe in %System32% then "cmd /u 
/c set" won't be in UTF-16 because it will execute that program.



>> by calling GetConsoleOutputCP inside child process with CreateRemoteThread

> That's not the only way. You can also start a detached Python process (via 
> pythonw.exe or DETACHED_PROCESS) to run a script that calls AttachConsole and 
> returns the result of calling GetConsoleOutputCP:

while useful to know it's still messy because I think you would need to prevent 
your target process from exiting before you've called AttachConsole and also 
most likely you want to get GetConsoleOutputCP before program's exit and not at 
start (say with CREATE_SUSPENDED) as it might have changed it somewhere in 
middle of program's execution. so looks like this route isn't worth going for.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27179>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to