Eryk Sun <eryk...@gmail.com> added the comment:
Apparently handling non-BMP codes is broken in recent builds of the new console in Windows 10. I see this problem in build 18362 as well. It seems there have been updates that have changed the naive way the console used to handle surrogate codes as just regular UCS-2 codes, and this has disrupted the UTF-16 wide-character API in several ways. This is probably related to the new support for virtual-terminal emulation and pseudoconsoles, since supporting a UTF-8 stream interface has required significant redesign of the console backend. Low-level ReadConsoleInputW and WriteConsoleInputW still work, but high-level ReadConsoleW now fails if it encounters a non-BMP surrogate pair, i.e. at least two key-event records with the non-BMP character encoded as a UTF-16 surrogate pair. It can be more than two input records depending on the source of input -- WriteConsoleInputW vs pasting from the clipboard -- in terms of KeyDown/KeyUp events or an Alt+Numpad sequence. There are issues with reading from screen buffers as well. WriteConsoleW can still successfully write non-BMP characters, and these can be copied from the console fine. But ReadConsoleOutputCharacterW can no longer read them. This used to work, but now it 'succeeds with 0 characters read if the screen-buffer region contains a non-BMP character. I checked the lower-level ReadConsoleOutputW function, and it's behaving differently now. It used to read a non-BMP character as two CHAR_INFO records containing the surrogate pair codes, but now it reads a non-BMP character as a single CHAR_INFO record containing a replacement character U+FFFD. I suppose we need to skip testing non-BMP and surrogate codes if the Windows version is (10, 0, 18362) and above. Also, _testconsole needs to support FlushConsoleInputBuffer. Every test that calls _testconsole.write_input should be isolated with a try/finally that flushes the input buffer at the end. For example: write_input(raw, 'spam') try: actual = input() finally: flush_input(raw) If reading fails, 'spam' will be flushed from the input buffer. ---------- nosy: +eryksun _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue38325> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com