E. Paine <paineeli...@gmail.com> added the comment:

Sorry, the point I was trying to make was that, unlike UTF-8, Tcl doesn't 
support variable length characters and they are instead fixed at 16 bits (by 
default). So, while Python and UTF-8 are perfectly happy with the emoji, unless 
Tcl is compiled with a particular build flag it will not process the character 
correctly (hence why I said it was surprising that Chip showed at all). I have 
tested on Tcl 8.6.10 and encountered the same problem described.

A further quote (granted, also old, but I cannot find anything to suggest this 
behaviour has been changed):
"Tcl can (currently) only represent characters within the Basic Multilingual 
Plane of Unicode, so there's no way that you can even feed an U+10000 into 
encoding convertto :-(. Fixing that is non-trivial, since some parts of Tcl 
(the C library) require a representation of strings where all characters take 
up the same number of bytes. It is possible to compile Tcl with that "number of 
bytes" set to 4 (meaning 32 bits per character), but it's rather wasteful, and 
has been reported not entirely compatible with Tk." 
[https://wiki.tcl-lang.org/page/utf-8]

If I can find the build flag mentioned, I will post it here for future 
reference.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41212>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to