Alexander Belopolsky <belopol...@users.sourceforge.net> added the comment:

After a bit of svn archeology, it does appear that Arabic-Indic digits' support 
was deliberate at least in the sense that the feature was tested for when the 
code was first committed. See r15000.

The test migrated from file to file over the last 10 years, but it is still 
present in test_float.py:

        self.assertEqual(float(b"  \u0663.\u0661\u0664  
".decode('raw-unicode-escape')), 3.14)

(It should probably be now rewritten using a string literal.)

I am now attaching the patch (issue10557.diff) that fixes the bug without 
sacrificing non-ASCII digit support.

If this approach is well-received, I would like to replace all calls to 
PyUnicode_EncodeDecimal() with the calls to the new 
_PyUnicode_EncodeDecimalUTF8() and deprecate Latin-1-oriented 
PyUnicode_EncodeDecimal().

For the future, I note that starting with Unicode 6.0.0, the Unicode Consortium 
promises that

"""
Characters with the property value Numeric_Type=de (Decimal) only occur in 
contiguous ranges of 10 characters, with ascending numeric values from 0 to 9 
(Numeric_Value=0..9).
"""

This makes it very easy to check a numeric string does not contain a mix of 
digits from different scripts.

I still believe that proper API should require explicit choice of language or 
locale before allowing digits other than 0-9 just as int() would not accept 
hexadecimal digits without explicit choice of base >= 16.  But this would be a 
subject of a feature request.

----------
Added file: http://bugs.python.org/file19865/issue10557.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue10557>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to