[issue13153] IDLE crashes when pasting non-BMP unicode char on Py3

Serhiy Storchaka Thu, 05 Nov 2015 23:16:12 -0800

Serhiy Storchaka added the comment:

There is no the Snake emoji in my font, I use the Cat Face emoji U+1F431 🐱 
(\xf0\x9f\x90\xb1 in UTF-8, \x3d\xd8\x31\xdc in UTF-16LE).


Move cursor or press Backspace. I had needed to press Left 2 times to move 
cursor to the begin of the line, press Right 4 times to move cursor back to the 
end of line, and press Backspace 4 times to remove all stuff. What is called 
"Tk doesn't support astral characters".

Get the text programmically.

>>> text.get('1.0', '1.end')
'ðﾟﾐﾱ'
>>> print(ascii(text.get('1.0', '1.end')))
'\xf0\uff9f\uff90\uffb1'

On Linux the clipboard uses UTF-8, and this symbol is represented by 4-bytes 
bytestring b'\xf0\x9f\x90\xb1' (that is why Tk sometimes interpret it as 4 
characters). When you request the text content as a Unicode, Tcl fails to 
decode the string from UTF-8 and falls back to Latin1. Due to other bug it 
extends the sign of some bytes. When you programmically insert the same string 
back, it will be encoded to b'\xc3\xb0\xef\xbe\x9f\xef\xbe\x90\xef\xbe\xb1' and 
displayed as 'ðﾟﾐﾱ'.

On Windows the clipboard uses UTF-16LE and you can see different results.

The underlying graphical system can support astral characters, but Tk fails to 
handle them correctly.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13153>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13153] IDLE crashes when pasting non-BMP unicode char on Py3

Reply via email to