On 3/9/2016 9:03 AM, BartC wrote:

I've just tried a UTF-8 file and getting some odd results. With a file
containing [three euro symbols]:

€€€

(including a 3-byte utf-8 marker at the start), and opened in text mode,
Python 3 gives me this series of bytes (ie. the ord() of each character):

239
187
191
226
8218
172
226
8218
172
226
8218
172

And prints the resulting string as: €€€. Although this latter
might depend on my console's code page setting.

It definitely does.

Changing it to UTF-8 however (CHCP 65001 in Windows)

CP65001 is MS's ugly pretense of unicode compatibility. It has been known to be buggy for over a decade, though some people claim to have gotten some use of it.

> gives me this error when I run the  program again:

----------
Fatal Python error: Py_Initialize: can't initialize sys standard streams
LookupError: unknown encoding: cp65001

This application has requested the Runtime to terminate it in an unusual
way.
Please contact the application's support team for more information.
----------

So I think I'll skip Unicode handling to start off with! (I've already
had plenty of fun and games with it in the past.)

At least on Windows, use IDLE for the BMP subset of unicode. tk and hence tkinter and IDLE can handle any char in the BMP subset. I believe that which are actually displayed and which are shown as boxes depends on the font. On my US Win10 system:

IDLE with Lucida Console:
>>> s = '€€€'
>>> s
'€€€'

In the console interpreter: '???' is printed.


--
Terry Jan Reedy


--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to