Eryk Sun <eryk...@gmail.com> added the comment:

Apparently handling non-BMP codes is broken in recent builds of the new console 
in Windows 10. I see this problem in build 18362 as well. It seems there have 
been updates that have changed the naive way the console used to handle 
surrogate codes as just regular UCS-2 codes, and this has disrupted the UTF-16 
wide-character API in several ways. This is probably related to the new support 
for virtual-terminal emulation and pseudoconsoles, since supporting a UTF-8 
stream interface has required significant redesign of the console backend.

Low-level ReadConsoleInputW and WriteConsoleInputW still work, but high-level 
ReadConsoleW now fails if it encounters a non-BMP surrogate pair, i.e. at least 
two key-event records with the non-BMP character encoded as a UTF-16 surrogate 
pair. It can be more than two input records depending on the source of input -- 
WriteConsoleInputW vs pasting from the clipboard -- in terms of KeyDown/KeyUp 
events or an Alt+Numpad sequence.

There are issues with reading from screen buffers as well. WriteConsoleW can 
still successfully write non-BMP characters, and these can be copied from the 
console fine. But ReadConsoleOutputCharacterW can no longer read them. This 
used to work, but now it 'succeeds with 0 characters read if the screen-buffer 
region contains a non-BMP character. I checked the lower-level 
ReadConsoleOutputW function, and it's behaving differently now. It used to read 
a non-BMP character as two CHAR_INFO records containing the surrogate pair 
codes, but now it reads a non-BMP character as a single CHAR_INFO record 
containing a replacement character U+FFFD. 

I suppose we need to skip testing non-BMP and surrogate codes if the Windows 
version is (10, 0, 18362) and above.

Also, _testconsole needs to support FlushConsoleInputBuffer. Every test that 
calls _testconsole.write_input should be isolated with a try/finally that 
flushes the input buffer at the end. For example:

    write_input(raw, 'spam')
    try:
        actual = input()
    finally:
        flush_input(raw)

If reading fails, 'spam' will be flushed from the input buffer.

----------
nosy: +eryksun

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue38325>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to