eryksun added the comment:

This isn't a Python bug. The Windows console doesn't properly support UTF-8. 
See issue 1602 and Drekin's win-unicode-console, an alternative REPL based on 
the wide-character (UCS-2) console API.

FWIW, I attached a debugger to conhost.exe under Windows 7 to inspect what's 
happening here. In the client, the CRT's read() function calls WinAPI ReadFile. 
For a console handle this calls either ReadConsoleA or (in Windows 8+) 
NtReadFile. Either way, most of the action happens in the server process, 
conhost.exe. 

The server's input buffer is Unicode, which gets encoded to CP 65001 (UTF-8) by 
calling WideCharToMultibyte. However the server incorrectly assumes the current 
codepage is a Windows ANSI codepage with a one-to-one mapping, i.e. that each 
16-bit wchar_t maps to an 8-bit char in the current codepage. Since 'ł' gets 
UTF-8 encoded as the two-byte string b'\xc5\x82', the allocated buffer is too 
small by a byte. The server doesn't recover from this failure by allocating a 
larger buffer. It just reports back to the client process that it read 0 bytes. 
The CRT in turn sets the end-of-file (EOF) flag on the stdin FILE stream, which 
causes Python to exit 'normally'.

----------
nosy: +eryksun

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue23424>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to