eryksun added the comment: > PS C:\Users\jaraco> echo £ | py -3 -c "import sys; > print(repr(sys.stdin.buffer.read()))" > b'?\r\n'
> Curiously, it appears as if powershell is actually receiving > a question mark from the pipe. PowerShell calls ReadConsoleW to read the console input buffer, i.e. it reads "£" as a wide character from the command line. The default encoding when writing to the pipe should be ASCII [*]. If that's the case it explains the question mark that Python reads from stdin. It's the default replacement character (WC_DEFAULTCHAR) used by WideCharToMultiByte. [*] http://blogs.msdn.com/b/powershell/archive/2006/12/11/outputencoding-to-the-rescue.aspx You can change PowerShell's output encoding to match the console: $OutputEncoding = [Console]::OutputEncoding If the console codepage is 65001, the above is equivalent to setting $OutputEncoding = [System.Text.Encoding]::UTF8 http://msdn.microsoft.com/en-us/library/system.text.encoding.utf8 As Victor mentioned, this setting always writes a BOM, and under codepage 65001 it actually writes 2 BOMs (at least in PowerShell 2). Victor also mentioned that you can avoid the BOM by passing $False to the constructor: $OutputEncoding = New-Object System.Text.UTF8Encoding($False) http://msdn.microsoft.com/en-us/library/system.text.utf8encoding There's still a BOM under codepage 65001, but maybe that's fixed in PowerShell 3. I avoid setting the console to codepage 65001 anyway. ReadFile/WriteFile incorrectly return the number of characters read/written instead of the number of bytes because the call is actually handled by ReadConsoleA/WriteConsoleA. Maybe that's finally fixed in Windows 8. ---------- nosy: +eryksun _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue21927> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com