Alexander Belopolsky added the comment: As a design principle, "accept what's unambiguous in any locale" is reasonable, but it is hard to apply consistently. I would agree that the status quo is hard to defend. After a long discussion, it has been accepted that fullwidth digits should be accepted and now float(u'123') is valid, but not float('+123'), float('-123') or float('12⒊'). The last example is
>>> '\N{FULLWIDTH DIGIT ONE}\N{FULLWIDTH DIGIT TWO}\N{DIGIT THREE FULL STOP}' '12⒊' All these variations can be neatly addressed by applying NFKC or NFKD normalization to unicode data before conversion: >>> float(unicodedata.normalize('NFKD', '+123')) 123.0 >>> float(unicodedata.normalize('NFKD', '-123')) -123.0 >>> float(unicodedata.normalize('NFKC', '12⒊')) 123.0 This would even allow parsing fullwidth hexadecimal numbers: >>> float.fromhex(unicodedata.normalize('NFKC', '0x⒈7p3')) 11.5 >>> int(unicodedata.normalize('NFKC', '7F'), 16) 127 but would not help with the MINUS SIGN. Allowing '\N{MINUS SIGN}' is particularly attractive because arguably unicode text should prefer it to ambiguous '\N{HYPHEN-MINUS}', but on the same token fractions.Fraction() should accept '\N{FRACTION SLASH}' in addition to the legacy '\N{SOLIDUS}'. Overall, I think this situation calls for a PEP-size proposal and discussion about handling unicode numerical data throughout stdlib rather that a case by case discussion of the various quirks in the curent version. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue6632> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com