[issue6632] Include more chars in the decimal codec

Alexander Belopolsky Sun, 09 Jun 2013 18:56:39 -0700

Alexander Belopolsky added the comment:

As a design principle, "accept what's unambiguous in any locale" is reasonable, 
but it is hard to apply consistently.  I would agree that the status quo is 
hard to defend.  After a long discussion, it has been accepted that fullwidth 
digits should be accepted and now float(u'１２３') is valid, but not 
float('＋１２３'), float('－１２３') or float('１２⒊'). The last example is


>>> '\N{FULLWIDTH DIGIT ONE}\N{FULLWIDTH DIGIT TWO}\N{DIGIT THREE FULL STOP}'
'１２⒊'

All these variations can be neatly addressed by applying NFKC or NFKD 
normalization to unicode data before conversion:

>>> float(unicodedata.normalize('NFKD', '＋１２３'))
123.0
>>> float(unicodedata.normalize('NFKD', '－１２３'))
-123.0
>>> float(unicodedata.normalize('NFKC', '１２⒊'))
123.0

This would even allow parsing fullwidth hexadecimal numbers:

>>> float.fromhex(unicodedata.normalize('NFKC', '０ｘ⒈７ｐ３'))
11.5
>>> int(unicodedata.normalize('NFKC', '７Ｆ'), 16)
127

but would not help with the MINUS SIGN.

Allowing '\N{MINUS SIGN}' is particularly attractive because arguably unicode 
text should prefer it to ambiguous '\N{HYPHEN-MINUS}', but on the same token 
fractions.Fraction() should accept '\N{FRACTION SLASH}' in addition to the 
legacy '\N{SOLIDUS}'.

Overall, I think this situation calls for a PEP-size proposal and discussion 
about handling unicode numerical data throughout stdlib rather that a case by 
case discussion of the various quirks in the curent version.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue6632>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6632] Include more chars in the decimal codec

Reply via email to