[issue5127] UnicodeEncodeError - I can't even see license

STINNER Victor Tue, 03 Feb 2009 05:14:12 -0800

STINNER Victor <[email protected]> added the comment:

amaury> Since r56395, ord() and chr() accept and return surrogate pairs 
amaury> even in narrow builds.


Note: My examples are made with Python 2.x.

> The goal is to remove most differences between narrow and wide unicode
> builds (except for string lengths, indices or slices)

It would be nice to get the same behaviour in Python 2.x and 3.x to help 
migration from Python2 to Python3 ;-)

unichr() (in Python 2.x) documentation is correct. But I would approciate to 
support surrogates using unichr() which means also changing ord() behaviour.

> To address this problem, I suggest to change all functions in
> unicodectype.c so that they accept Py_UCS4 characters (instead of
> Py_UNICODE).

Why? Using surrogates, you can use 16-bits Py_UNICODE to store non-BMP 
characters (code > 0xffff).

--

I can open a new issue if you agree that we can change unichr() / ord() 
behaviour on narrow build. We may ask on the mailing list?

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue5127>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue5127] UnicodeEncodeError - I can't even see license

Reply via email to