On 3/9/2016 9:03 AM, BartC wrote:
I've just tried a UTF-8 file and getting some odd results. With a file
containing [three euro symbols]:
€€€
(including a 3-byte utf-8 marker at the start), and opened in text mode,
Python 3 gives me this series of bytes (ie. the ord() of each character):
239
187
191
226
8218
172
226
8218
172
226
8218
172
And prints the resulting string as: €€€. Although this latter
might depend on my console's code page setting.
It definitely does.
Changing it to UTF-8 however (CHCP 65001 in Windows)
CP65001 is MS's ugly pretense of unicode compatibility. It has been
known to be buggy for over a decade, though some people claim to have
gotten some use of it.
> gives me this error when I run the program again:
----------
Fatal Python error: Py_Initialize: can't initialize sys standard streams
LookupError: unknown encoding: cp65001
This application has requested the Runtime to terminate it in an unusual
way.
Please contact the application's support team for more information.
----------
So I think I'll skip Unicode handling to start off with! (I've already
had plenty of fun and games with it in the past.)
At least on Windows, use IDLE for the BMP subset of unicode. tk and
hence tkinter and IDLE can handle any char in the BMP subset. I believe
that which are actually displayed and which are shown as boxes depends
on the font. On my US Win10 system:
IDLE with Lucida Console:
>>> s = '€€€'
>>> s
'€€€'
In the console interpreter: '???' is printed.
--
Terry Jan Reedy
--
https://mail.python.org/mailman/listinfo/python-list