Alexander Belopolsky added the comment:

As a design principle, "accept what's unambiguous in any locale" is reasonable, 
but it is hard to apply consistently.  I would agree that the status quo is 
hard to defend.  After a long discussion, it has been accepted that fullwidth 
digits should be accepted and now float(u'123') is valid, but not 
float('+123'), float('-123') or float('12⒊'). The last example is

>>> '\N{FULLWIDTH DIGIT ONE}\N{FULLWIDTH DIGIT TWO}\N{DIGIT THREE FULL STOP}'
'12⒊'

All these variations can be neatly addressed by applying NFKC or NFKD 
normalization to unicode data before conversion:

>>> float(unicodedata.normalize('NFKD', '+123'))
123.0
>>> float(unicodedata.normalize('NFKD', '-123'))
-123.0
>>> float(unicodedata.normalize('NFKC', '12⒊'))
123.0

This would even allow parsing fullwidth hexadecimal numbers:

>>> float.fromhex(unicodedata.normalize('NFKC', '0x⒈7p3'))
11.5
>>> int(unicodedata.normalize('NFKC', '7F'), 16)
127

but would not help with the MINUS SIGN.

Allowing '\N{MINUS SIGN}' is particularly attractive because arguably unicode 
text should prefer it to ambiguous '\N{HYPHEN-MINUS}', but on the same token 
fractions.Fraction() should accept '\N{FRACTION SLASH}' in addition to the 
legacy '\N{SOLIDUS}'.

Overall, I think this situation calls for a PEP-size proposal and discussion 
about handling unicode numerical data throughout stdlib rather that a case by 
case discussion of the various quirks in the curent version.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue6632>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to