Amaury Forgeot d'Arc <amaur...@gmail.com> added the comment:

> I don't see the point in changing the various conversion APIs in the
> unicode database to return Py_UCS4 when there are no conversions that
> map code points between BMP and non-BMP.

For consistency: if Py_UNICODE_ISPRINTABLE is changed to take Py_UCS4, 
Py_UNICODE_TOLOWER should also take Py_UCS4, and must return the same type.

> In order to solve the problem in question (unicode_repr() failing), 
> we should change the various property checking APIs to accept Py_UCS4
> input data. This needlessly increases the type database size without
> real benefit.
[I'm not sure to understand. For me the 'real benefit' is that it solves the 
problem in question.]

Yes this increases the type database: there are 300 more "case" statements in 
_PyUnicode_ToNumeric(), and the PyUnicode_TypeRecords array needs 1068 more 
bytes.
On Windows, VS9.0 release build, unicodectype.obj grows from 86Kb to 94Kb; 
python32.dll is exactly 1.5Kb larger (from 2219Kb to 2221.5Kb);
the memory usage of the just-started interpreter is about 32K larger (around 
5M).  These look reasonable figures to me.

> For that to work properly we'll have to either make sure that
> extensions get recompiled if they use these changed APIs, or we
> provide an additional set of UCS2 APIs that extend the Py_UNICODE
> input value to a Py_UCS4 value before calling the underlying Py_UCS4
> API.

Extensions that use these changed APIs need to be recompiled, or they won't 
load: existing modules link with symbols like _PyUnicodeUCS2_IsPrintable, when 
the future interpreter will define _PyUnicode_IsPrintable.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue5127>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to