[issue5127] UnicodeEncodeError - I can't even see license

Marc-Andre Lemburg Mon, 05 Oct 2009 04:41:25 -0700

Marc-Andre Lemburg <[email protected]> added the comment:

Amaury Forgeot d'Arc wrote:
> 
> Amaury Forgeot d'Arc <[email protected]> added the comment:
> 
>> we should make sure that it's not possible to load an extension
>> compiled with 3.1 in 3.2 to prevent segfaults and buffer overruns.
> 
> This is the case with this patch: today all these functions
> (_PyUnicode_IsAlpha, _PyUnicode_ToLowercase) are actually #defines to
> _PyUnicodeUCS2_* or _PyUnicodeUCS4_*.
> The patch removes the #defines: 3.1 modules that call
> _PyUnicodeUCS4_IsAlpha wouldn't load into a 3.2 interpreter.


True, but we can do better. For narrow builds, the API currently
exposes the UCS2 APIs. We'd need to expose the UCS4 APIs *in addition*
to those APIs and have the UCS2 APIs redirect to the UCS4 ones.

For wide builds, we don't need to change anything.

>> The change affects the Unicode type database which is implemented
>> in unicodectype.c, not the Unicode database, which already uses UCS4.
> 
> Are you referring to the _PyUnicode_TypeRecord structure?
> The first three fields only contains values up to 65535, so they could
> use "unsigned short" even for UCS4 builds.

I haven't checked, but it's certainly possible to have a code point
use a non-BMP lower/upper/title case mapping, so this should be
made possible as well, if we're going to make changes to the type
database.

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue5127>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue5127] UnicodeEncodeError - I can't even see license

Reply via email to