Marc-Andre Lemburg <m...@egenix.com> added the comment: Alexander Belopolsky wrote: > > Alexander Belopolsky <belopol...@users.sourceforge.net> added the comment: > > After a bit of svn archeology, it does appear that Arabic-Indic digits' > support was deliberate at least in the sense that the feature was tested for > when the code was first committed. See r15000.
As I mentioned on python-dev (http://mail.python.org/pipermail/python-dev/2010-November/106077.html) this support was added intentionally. > The test migrated from file to file over the last 10 years, but it is still > present in test_float.py: > > self.assertEqual(float(b" \u0663.\u0661\u0664 > ".decode('raw-unicode-escape')), 3.14) > > (It should probably be now rewritten using a string literal.) > > I am now attaching the patch (issue10557.diff) that fixes the bug without > sacrificing non-ASCII digit support. > If this approach is well-received, I would like to replace all calls to > PyUnicode_EncodeDecimal() with the calls to the new > _PyUnicode_EncodeDecimalUTF8() and deprecate Latin-1-oriented > PyUnicode_EncodeDecimal(). It would be better to copy and iterate over the Unicode string first, replacing any decimal code points with ASCII ones and then call the UTF-8 encoder. The code as it stands is very inefficient, since it will most likely run the memcpy() part for every code point after the first non-ASCII decimal one. > For the future, I note that starting with Unicode 6.0.0, the Unicode > Consortium promises that > > """ > Characters with the property value Numeric_Type=de (Decimal) only occur in > contiguous ranges of 10 characters, with ascending numeric values from 0 to 9 > (Numeric_Value=0..9). > """ > > This makes it very easy to check a numeric string does not contain a mix of > digits from different scripts. I'm not sure why you'd want to check for such ranges. > I still believe that proper API should require explicit choice of language or > locale before allowing digits other than 0-9 just as int() would not accept > hexadecimal digits without explicit choice of base >= 16. But this would be > a subject of a feature request. Since when do we require a locale or language to be specified when using Unicode ? The codecs, Unicode methods and other Unicode support features happily work with all kinds of languages, mixed or not, without any such specification. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue10557> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com