STINNER Victor <victor.stin...@haypocalc.com> added the comment:

Here are some results of my test of unicode2.py. I'm testing py3k on Windows 
XP, OEM: cp850, ANSI: cp1252.

Raster fonts
------------

With a fresh console, unicode2.py displays "?????????????????". input() accepts 
characters encodable to the OEM code page.

If I set the code page to 65001 (chcp program+set PYTHONIOENCODING=utf-8; or 
SetConsoleCP() + SetConsoleOutputCP()), it displays weird characters. input() 
accepts ASCII characters, but non-ASCII characters (encodable to the console 
and OEM code pages) display weird characters (smileys! control characters?).

Lucida console
--------------

With my system code page (OEM: cp850), characters not encodable to the code 
pages are displayed correctly. I can type some non-ASCII characters (encodable 
to the code page). If I copy/paste characters non encodable to the code page, 
there are replaced by similar glyph (eg. Ł => L) or ? (€ => ?).

If I set the code page to 65001, all characters are still correctly displayed. 
But I cannot type non-ASCII characters anymore: input() fails with EOFError (I 
suppose that Python gets control characters).

Redirect output to a pipe
-------------------------

I patched unicode2.py to use sys.stdout.buffer instead of sys.stdout for 
UnicodeOutput stream. I also patched UnicodeOutput to replace \n by \r\n. 

It works correctly with any character. No UTF-8 BOM is written. But "Here 1" is 
written at the end. I suppose that sys.stdout should be flushed before the 
creation of UnicodeOutput.

But it always use UTF-8. I don't know if UTF-8 is well supported by any 
application on Windows.

Without unicode2.py, only characters encodable to OEM code page are supported, 
and \n is used as end of line string.

Let's try to summarize
----------------------

Tests:
 d1) Display characters encodable to the console code page
 t1) Type characters encodable to the console code page
 d2) Display characters not encodable to any code page
 t2) Type characters not encodable to any code page

I'm using Windows with OEM=cp850 and ANSI=cp1252. For test (t2), I copy €-Ł and 
paste it to the console (right click on the window title > Edit > Paste).

Raster fonts, console=cp850:

d1) ok
t1) ok
d2) FAIL: €-Ł is displayed ?-L
t2) FAIL: €-Ł is read as ?-L

Raster fonts, console=cp65001:

d1) FAIL: é is displayed as 2 strange glyphs
t1) FAIL: EOFError
d2) FAIL: only display unreadable glyphs
t2) FAIL: EOFError

Lucida console, console=cp850:

d1) ok
t1) ok
d2) ok
t2) FAIL: €-Ł is read as ?-L

Lucida console, console=cp65001:

d1) ok
t1) FAIL: EOFError
d2) ok
t2) FAIL: EOFError

So, setting the console code page to 65001 doesn't solve any issue, but it 
breaks the input (input with the keyboard or pasting text).

With Raster fonts or Lucida console, it's possible to display characters 
encodable to the code page. But it is not new, it's already possible with 
Python 3. But for characters not encodable to the code page, it works with 
unicode2.py and Lucida console, with is something new :-)

For the input, I suppose that we need also to use a Windows console function, 
to support unencodable characters.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue1602>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to