Amaury Forgeot d'Arc <amaur...@gmail.com> added the comment: > I must be missing some detail, but what does the Unicode database > have to do with the unicodeobject.c C API ?
Ah, now I understand your concerns. My suggestion is to change only the 20 functions in unicodectype.c: _PyUnicode_IsAlpha, _PyUnicode_ToLowercase... and no change in unicodeobject.c at all. They all take a single code point as argument, some also return a single code point. Changing these functions is backwards compatible. I join a patch so we can argue on concrete code (tests are missing). Another effect of the patch: unicodedata.numeric('\N{AEGEAN NUMBER TWO}') can return 2.0. The str.isalpha() (and others) methods did not change: they still split the surrogate pairs. ---------- keywords: +patch Added file: http://bugs.python.org/file12934/unicodectype_ucs4.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue5127> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com